Pre-training acceleration using multi-machine distributed training

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

https://arxiv.org/abs/2002.06353

MIT License

339 stars 54 forks source link

Pre-training acceleration using multi-machine distributed training #29

Closed mingtan2 closed 2 years ago

mingtan2 commented 2 years ago

Hi @ArrowLuo ,

Thanks for the great work. I read in your paper that the second pre-training stage costs around 12 days using 8 GPUs. I am wondering if you tried to use multi-machine distributed training to accelerate it. Is your code base compatible with that? Thanks in advance.

ArrowLuo commented 2 years ago

Hi @mingtan2, Yes, you can run distributed training with python -m torch.distributed.launch. Good Luck!