Open vymao opened 3 years ago
That's weird and hard to reproduce without more information. It would be great if you could debug this error, in particular, what's the value of split.tolist()
in line 80?
Is there a way I can find out? split.tolist()
is in the module code.
You could use a debugger, or using print
statements in /n/scratch3/users/v/vym1/nn/lib/python3.7/site-packages/torch_geometric/nn/data_parallel.py
.
Ok. This error seems to occur randomly as well, so it seems difficult to track exclusively.
🐛 Bug
I have been having intermittent issues with the DataParallel module, which I use to parallelize GPU training (I use 2 GPUs here). I am getting the following error:
This problem occurs on random epochs (here, it occurred on the 7th epoch) if I rerun the training, I am not sure why. Because I can run some number of epochs without error, it seems like it would probably be an error with the module and not the computation.
Do you know what might be causing this error?