Closed Albert-Ma closed 3 years ago
Hi, @Albert-Ma . Thank you for your kind words !
No, all pretrained model showed currently are pretrained over 1M steps (i.e. 100% trained as described in the paper), I've noticed there is about 0.7 score improvement between 50% trained and 100% trained. I am planning to release these kind of info, but it may take some time.
I follow the paper so as it doesn't specified, it is pretrained over 1M steps as the paper did.
Great job AGAIN!
I have a question, did you test the glue score over the different pre-training steps? What's the behavior or what happened during pertaining?
And how do you choose the checkpoint or just training to a certain step and use that one?
Thanks!