如何多卡并行训练模型（How to train multi-card models in parallel？）

richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)

325 stars 42 forks source link

你好，我正在尝试ELECTRA的方法训练自己的预训练模型，我在其他地方看到你的代码实现了ELECTRA的多GPU并行，但是我在尝试运行的时候发现只有一个GPU在运行，我的num_workers被设置为4，num_proc被设置为4，尝试运行时用的数据大小约0.5k。请问要实现多GPU并行还需要注意哪里？

Hi, I am trying ELECTRA to train my pre-training model, I learnt elsewhere that your code implements ELECTRA's multi-GPU parallelization, but When I tried to run it, I found that only one GPU was running. My num_workers was set to 4, num_proc was set to 4. The data size used for the attempted run is about 0.5K. What else should be change when implementing multi-GPU parallelism?

Hi, I didn't directly write the option to make use of multi-GPU training. But, fastai, the framework which this repo is based on, supports multi-GPU training. So, you should be able to train on multi-GPU by referencing the document of fastai (BTW, this code is compatible with fastai 2.1.10).

Additionally, according to others have tried to train on multi-GPU with this repo, adding learn.model = torch.nn.DataParallel(learn.model, device_ids=[0,1,2,3]) after initialization of Learner (https://github.com/richarddwang/electra_pytorch/blob/ab29d03e69c6fb37df238e653c8d1a81240e3dd6/pretrain.py#L388-L396) might be a good thing to try.

Please tag me to reopen this issue if there's anything else I can help you.

richarddwang / electra_pytorch

如何多卡并行训练模型（How to train multi-card models in parallel？） #30