Open agemagician opened 5 years ago
Bert equivalent https://github.com/google-research/bert/pull/568
@LifeIsStrange Thanks for the links.
I already know both of them, but as you already know they only support bert and GPT, but not XLNet.
For my use-case, I am interested in XLNet. Hopefully, we will have a distributed GPU version soon.
Actually you can, just set,
--num_core_per_host=3 --train_batch_size=30
# 3 gpus and 30 batch will automatically divide among 3 gpus
But current implementation is using old technique distribution, you will find your RAM will leak very bad.
I created multigpus pretraining session for xlnet using mirrorstrategy.
Instruction how to use. Source code. Just copy paste this code after cloned this repository.
Please remove CUDA_VISIBLE, I put there to limit my gpus usage.
Tested on 2 TESLA V100 32GB VRAM.
@huseinzol05 This is multi-gpu training for single node training. I am asking about distributed GPU Training for multi-nodes.
Actually you just add tf_config
like this, https://lambdalabs.com/blog/tensorflow-2-0-tutorial-05-distributed-training-multi-node/
Both your code and the official code are using "MirroredStrategy" which works for single node multi-gpu, in order to make it work for multiple nodes a "MultiWorkerMirroredStrategy" should be used.
It is also written in the blog post you post it here. "tf_config" works with "MultiWorkerMirroredStrategy".
I believe you can change it after copy pasted? lol
Thanks for the information, but I am looking for more advanced large scale distributed training using Horovod for example.
Hello,
Any plans to have a script for training XLNet on distributed GPUs?
Maybe with Horovod or MultiWorkerMirroredStrategy?