kamalkraj / ALBERT-TF2.0

ALBERT model Pretraining and Fine Tuning using TF2.0
Apache License 2.0
199 stars 45 forks source link

Training from scratch in another language #11

Closed peregilk closed 4 years ago

peregilk commented 4 years ago

I want to train AlBert from scratch in a non-English language. I have access to a corpus of 1-2 B words. Would that be sufficient?

Would training on one single Cloud TPU v3 with 128Gb RAM be feasible? Can you give an estimated training time for base, large and xlarge?

kamalkraj commented 4 years ago

@peregilk I haven't run pretraining, so I don't know how much time it will take. This codebase has only supported for GPU and CPU. You use the original author implementation for TPU based training. After training, you convert the weights to tf2.0 using for further fine-tuning tasks