Closed benja-matic closed 3 years ago
Good find. The solution is simple and straightforward. Note, there could be others ways. For one, we could override get_eval_dataloader
, but I think that would a bit cumbersome. I think yours will work quite well for our needs.
@benja-matic Do you know how this change affects the fine-tuning results? We should probably rerun those from the README (including bert_1mi
, bert_100k
, sparse_80%_kd_onecycle_lr_rigl
, and sparse_80%_kd_onecycle_lr
) @lucasosouza What do you think? Is this necessary? My concern is that it may make it harder to contextualize new results if we don't rerun previous fine-tuning experiments. Or at the very least, we should rerun one or two just to make sure the results are negligibly affected.
@benja-matic Do you know how this change affects the fine-tuning results? We should probably rerun those from the README (including
bert_1mi
,bert_100k
,sparse_80%_kd_onecycle_lr_rigl
, andsparse_80%_kd_onecycle_lr
) @lucasosouza What do you think? Is this necessary? My concern is that it may make it harder to contextualize new results if we don't rerun previous fine-tuning experiments. Or at the very least, we should rerun one or two just to make sure the results are negligibly affected.
@mvacaporale don't know just yet. I'm rerunning fine tuning on baseline models this afternoon.
Regarding RES-2190. Looks like HuggingFace trainer looks at args.dataloader_drop_last for train, and eval loaders. Workaround is to turn flip that to False during evaluation in compute_metrics_task, flip it back to True at the end (if it was originally true). A few minor ignorable formatting edits will merge in as well. Finally, there's a finetuning experiment for tiny_bert50k.