Open jeswan opened 4 years ago
Comment by iftenney Wednesday Jul 18, 2018 at 03:25 GMT
Oh dear. Wait, we're not doing this already? Does this invalidate results?
Comment by W4ngatang Wednesday Jul 18, 2018 at 03:33 GMT
I'm pretty sure we do it before validating but outside the function
On Tue, Jul 17, 2018, 23:25 Ian Tenney notifications@github.com wrote:
Oh dear. Wait, we're not doing this already? Does this invalidate results?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jsalt18-sentence-repl/jiant/issues/185#issuecomment-405797148, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQ7GzfX6LlCxUI1M0xWlssCz6TX5V_nks5uHqq-gaJpZM4VT4Bn .
Comment by iftenney Wednesday Jul 18, 2018 at 03:37 GMT
Hrm, I think we should look in to this, at least to rule out a potentially result-invalidating bug.
Comment by sleepinyourhat Wednesday Jul 18, 2018 at 03:50 GMT
On Tue, Jul 17, 2018 at 11:37 PM Ian Tenney notifications@github.com wrote:
Hrm, I think we should look in to this, at least to rule out a potentially result-invalidating bug.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jsalt18-sentence-repl/jiant/issues/185#issuecomment-405798751, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOZWfTWnLWissVU0SL6S9qZbAtol0IRks5uHq2LgaJpZM4VT4Bn .
Comment by sleepinyourhat Friday Jul 27, 2018 at 16:47 GMT
@W4ngatang - Mind checking on this? I didn't see a problem myself. Should resolve one way or another soon.
Comment by sleepinyourhat Thursday Aug 30, 2018 at 16:27 GMT
In other experiments, I confirmed that the final validation metrics are correct across a few datasets—I wrote predictions to disk, manually computed the metric, and compared with the reported number.
However, if anything is wrong with the mid-training validation metrics, that'd be a bit harder to catch, and could impact early stopping. @W4ngatang - Mind giving this a quick look?
Issue by W4ngatang Wednesday Jul 18, 2018 at 03:15 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/185
Before validating or evaluation, we should reset each task's scorers to ensure we're validating on just the data from the split. This will likely causes division by zero errors we'll need to catch.