jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Train an electra model #55

Open jowagner opened 3 years ago

jowagner commented 3 years ago

Reading https://towardsdatascience.com/electra-is-bert-supercharged-b450246c4edb and the original ICLR 2020 paper of Clark et al., electra may a good addition to the selection of models we train.

jbrry commented 3 years ago

Thanks. The instructions on the official repo look pretty clear. It also uses the TFRecord format and Tensorflow 1.15 like BERT. I'd assume once we have our training text file(s) it would be easy enough to generate the pre-training data format and launch it on TPU.

jbrry commented 3 years ago

Also train an Electra Large model to understand the role of model size when data size is fixed. See also Sect. 7.1 here: https://arxiv.org/abs/2010.10906

jowagner commented 3 years ago

You mentioned in the meeting that training the final electra model on TPU would take a lot longer than training the final BERT model on TPU. Why is this? My understanding from the paper is that electra is supposed to reach a given performance level more quickly than BERT.

It will also be a good idea to first investigate issue #81 before training electra on TPU as whatever went wrong there may also apply to TPU.