Open Saltb0xApps opened 1 week ago
I utilise PyTorch-Lightning, which should be able to automatically resolve the problem. In the meantime, maybe you can check if I hard-coded cuda:0 in any python file.
@ldzhangyx No hardcoded cuda:0 calls in any python code. I believe the issue seems to be with instructmusicgenadapter_module.py
-> forward
function handling of tensors, not the PyTorch-Lightning side of things.
Hey! i tried running the training with DDP over 2 GPUs but got this error -
The training does start, but this error comes up in the first 15-20 seconds. I'd imagine its probably some minor issue with file handling across multiple devices?