Closed johannwyh closed 2 years ago
Sorry for the bother, I solved this issue but exactly following the environment setup given by the README.
BTW, the environment that failed in the above situation is as followed,
>>> torch.__version__
'1.10.2'
>>> torch.version.cuda
'11.3'
While the nvcc -V
is 11.3
Thanks for pointing it out! I'll close the issue.
Dear authors,
Hello! First of all, thank you for your inspiring work!
I encountered an issue with multi-GPU training on my 8 V100-16G GPUs. When distributing models across GPUs,
the process failed on first module
G_mapping
, reportingThe GPU memory consumption status is as follow,
I am not very familiar with this and seemingly
GPU_0
is running out of memory. I am wondering whether it is the reason behind thencclUnhandledError
.Could you please help me figure out what caused this error? Is your implementation working on 16GB V100 GPUs?
Thank you very much.