Closed PhilipMay closed 3 years ago
There is also a discussion about multi gpu #5 And current conclusion is we haven't found any solution for the multi-gpu problem.
Besides this, I am working on a new repo which implements ELECTRA and is based on PytorchLightning. Although I have succeed on using sharded training (improved Multi-GPU training), but I am still writing code and haven't get results to validate it. I guess release might be after several months.
Anyway, I'll close the issue to keep only one issue about multi-GPU, feel free to tag me if you get new findings.
feel free to tag me if you get new findings.
@richarddwang yes - this fixes it for me: https://github.com/richarddwang/electra_pytorch/issues/5#issuecomment-735989497
Hi,
could you please provide information how to pretrain on multiple GPUs? I did try to send
ELECTRAModel
to CUDA and wrap it withDataParallel
without success. See this screenshot with the comments.Sorry - I can not copy the text out of my env.
I do not know why tensors are also on CUDA:1
PS:
c.device
is still'cuda:0'
Could you please help me?
Thanks Philip