Closed peregilk closed 4 years ago
@peregilk I haven't run pretraining, so I don't know how much time it will take. This codebase has only supported for GPU and CPU. You use the original author implementation for TPU based training. After training, you convert the weights to tf2.0 using for further fine-tuning tasks
I want to train AlBert from scratch in a non-English language. I have access to a corpus of 1-2 B words. Would that be sufficient?
Would training on one single Cloud TPU v3 with 128Gb RAM be feasible? Can you give an estimated training time for base, large and xlarge?