Closed Raldir closed 2 years ago
I can try running this experiment maybe later half of this week. Meanwhile, I remember WSC to be a tricky dataset that often produces unstable results. Would you mind running it with a few other seeds and seeing if this behavior persists?
And btw, is this just WSC? Do other datasets have this problem?
Hi Haokun, thank you for the response. Indeed, after changing the seed results are more as expected. I have been having similar problems with WiC, but again it appears to be caused by the variability of the seed. RTE seems more stable.
Thank you for the amazing work on t-few! I've noticed strange behavior when I am running superglue's wsc. I've been logging the validation score every 40 epochs using
self.eval_epoch_interval = 40
and when running the command:python -m src.pl_train -c ia3.json+wsc.json -k save_model=False exp_name=first_exp
the output is as following:{"accuracy": 0.6730769230769231, "score_gt": 0.5068197436630726, "score_cand": 0.7191649047801127} {"accuracy": 0.49038461538461536, "score_gt": 1.4563168384707892, "score_cand": 1.505529030584372} {"accuracy": 0.47115384615384615, "score_gt": 3.4743554890155792, "score_cand": 2.727144861450562} {"accuracy": 0.46153846153846156, "score_gt": 4.202766236777489, "score_cand": 3.5702959763316007} {"accuracy": 0.40384615384615385, "score_gt": 5.157541000499175, "score_cand": 3.5657502871293287} {"accuracy": 0.3942307692307692, "score_gt": 5.397989429533482, "score_cand": 3.975659689651086} {"accuracy": 0.40384615384615385, "score_gt": 5.073869264469697, "score_cand": 3.995581218542961}
The last accuracy score is reported at 240 epochs out of a total 250 epochs.
Any ideas on what is going on here? Thanks!