Closed mtar closed 1 year ago
This is causing the issue. Unlike our own implementation, it keeps the device between processes.
@mtar should I use 'Bcast' then, as mentioned here in #790?
@mtar should I use 'Bcast' then, as mentioned here in #790?
Yes, that should fix it.
What happened?
convolve does not work if the kernel is distributed when more than one GPU is available.
Code snippet triggering the error
Error message or erroneous outcome
Version
main (development branch)
Python version
3.10
PyTorch version
1.12
MPI version