richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
324 stars 41 forks source link

Glue score during pre-training #6

Closed Albert-Ma closed 3 years ago

Albert-Ma commented 3 years ago

Great job AGAIN!

I have a question, did you test the glue score over the different pre-training steps? What's the behavior or what happened during pertaining?

And how do you choose the checkpoint or just training to a certain step and use that one?

Thanks!

richarddwang commented 3 years ago

Hi, @Albert-Ma . Thank you for your kind words !

No, all pretrained model showed currently are pretrained over 1M steps (i.e. 100% trained as described in the paper), I've noticed there is about 0.7 score improvement between 50% trained and 100% trained. I am planning to release these kind of info, but it may take some time.

I follow the paper so as it doesn't specified, it is pretrained over 1M steps as the paper did.