Closed shark803 closed 4 years ago
Hi, Thanks for bringing up this point. I have tried finetuning ALBERT pretrained MTB on the SemEval task, and compared it against training ALBERT directly on SemEval. Indeed, it looks like the F1 score is worse if one uses pretrained MTB for ALBERT, so I guess ALBERT's architecture may not be suitable for MTB. I have updated the repo with the results for reference.
Thank you for your kindly response. We find that Bert_uncase pretrained model performances better on SemEval. We just pretrain the mtb 2 epoches, and can get the f1 0.764. We try to pretrain more epoches on another server. But the code seems only support one gpu during the protrain, how can I fix the code to support more gpus?
Generally, as presented in the paper, MTB has diminishing returns as the amount of fine-tuning dataset increases. So, practically, you might not need so many epochs (or even MTB) if you are fine-tuning on alot of data
I do not have multiple gpus so I can't support this, but it should be just adding a few lines with torch multiprocessing which you can lookup.
Hi, please reclone to get the revised MTB model as the previous implementation was not correct #9
We use your albert pretrained checkpoint files, and the finetune. The f1 0.5978749489170413. Why the value is so low. We only use the default value of the args.