guotong1988 / BERT-GPU

multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Apache License 2.0
173 stars 54 forks source link