Replaced the old drop_last fix with a more durable solution, which overrides the predict and evaluate methods of the trainer. This fix covers the cases where those methods are called internally in HF code, not just by us.
Updated the readme results table: bert100k, bert_1mi, and sparse_80%_kd_onecycle_lr_rigl. The bert_1mi results are calculated using an extra 10 runs on wnli task.
There are some new experiments I'm playing with, like the "simple but hard to beat" baseline in finetuning.py. That stuff might still move around. Wanted to get this PR going for the results and the new fix.
Have you tried running with tiny_bert_linear_lr_range_test? It would be good to know this can fully run without any issues as the lr-range tests validate every epoch.
Replaced the old drop_last fix with a more durable solution, which overrides the predict and evaluate methods of the trainer. This fix covers the cases where those methods are called internally in HF code, not just by us.
Updated the readme results table: bert100k, bert_1mi, and sparse_80%_kd_onecycle_lr_rigl. The bert_1mi results are calculated using an extra 10 runs on wnli task.
There are some new experiments I'm playing with, like the "simple but hard to beat" baseline in finetuning.py. That stuff might still move around. Wanted to get this PR going for the results and the new fix.