richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
324 stars 41 forks source link

How do I pretrain on multiple GPUs? #17

Closed PhilipMay closed 3 years ago

PhilipMay commented 3 years ago

Hi,

could you please provide information how to pretrain on multiple GPUs? I did try to send ELECTRAModel to CUDA and wrap it with DataParallel without success. See this screenshot with the comments.

Sorry - I can not copy the text out of my env.

grafik

I do not know why tensors are also on CUDA:1

PS: c.device is still 'cuda:0'

Could you please help me?

Thanks Philip

richarddwang commented 3 years ago

There is also a discussion about multi gpu #5 And current conclusion is we haven't found any solution for the multi-gpu problem.

Besides this, I am working on a new repo which implements ELECTRA and is based on PytorchLightning. Although I have succeed on using sharded training (improved Multi-GPU training), but I am still writing code and haven't get results to validate it. I guess release might be after several months.

Anyway, I'll close the issue to keep only one issue about multi-GPU, feel free to tag me if you get new findings.

PhilipMay commented 3 years ago

feel free to tag me if you get new findings.

@richarddwang yes - this fixes it for me: https://github.com/richarddwang/electra_pytorch/issues/5#issuecomment-735989497