Training time and ++ version

richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)

324 stars 41 forks source link

First of all, thanks for the great repo, this is an absolute lifesaver for me. I have two questions though:

How long does it take to pretrain on a single GPU? I thought I read it somewhere but I can't find it anymore (maybe I'm remembering something that isn't there).
You mention that the checkpoints in Huggingface are all the ++ version. Is the default configuration of pretrain.py the correct one for the ++ version, or what needs to change? I want to use your repo to train a Dutch ELECTRA-small model, but I want it to be as comparable as possible to the English ELECTRA-small checkpoint from Huggingface.

richarddwang / electra_pytorch