Open jowagner opened 3 years ago
Thanks. The instructions on the official repo look pretty clear. It also uses the TFRecord
format and Tensorflow 1.15
like BERT. I'd assume once we have our training text file(s) it would be easy enough to generate the pre-training data format and launch it on TPU.
Also train an Electra Large model to understand the role of model size when data size is fixed. See also Sect. 7.1 here: https://arxiv.org/abs/2010.10906
You mentioned in the meeting that training the final electra model on TPU would take a lot longer than training the final BERT model on TPU. Why is this? My understanding from the paper is that electra is supposed to reach a given performance level more quickly than BERT.
It will also be a good idea to first investigate issue #81 before training electra on TPU as whatever went wrong there may also apply to TPU.
Reading https://towardsdatascience.com/electra-is-bert-supercharged-b450246c4edb and the original ICLR 2020 paper of Clark et al., electra may a good addition to the selection of models we train.