richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
324 stars 41 forks source link

Training time and ++ version #31

Closed Luuk99 closed 2 years ago

Luuk99 commented 2 years ago

First of all, thanks for the great repo, this is an absolute lifesaver for me. I have two questions though:

  1. How long does it take to pretrain on a single GPU? I thought I read it somewhere but I can't find it anymore (maybe I'm remembering something that isn't there).

  2. You mention that the checkpoints in Huggingface are all the ++ version. Is the default configuration of pretrain.py the correct one for the ++ version, or what needs to change? I want to use your repo to train a Dutch ELECTRA-small model, but I want it to be as comparable as possible to the English ELECTRA-small checkpoint from Huggingface.

richarddwang commented 2 years ago

Thanks for the comment! As to the questions:

  1. About 4~5 days on a single v100 I remembered.

  2. The default configuration is not for ++. As far as I can remembered, ++ increases generator size and training steps. So you can modify size divisor (line 82) to 1 and steps (line 80) according to the paper, in pretrain.py, to do the same thing. Note that the setting will take much more time than the original and may not be suitable for your dataset.

Please tag me to reopen the issue if you have other questions.