helmholtz-analytics / heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
https://heat.readthedocs.io/
MIT License
210 stars 53 forks source link

[Bug]: convolve with distributed kernel on multiple GPUs #1085

Closed mtar closed 1 year ago

mtar commented 1 year ago

What happened?

convolve does not work if the kernel is distributed when more than one GPU is available.

Code snippet triggering the error

import heat as ht

dis_signal = ht.arange(0, 16, split=0, device='gpu', dtype=ht.int)
dis_kernel_odd = ht.ones(3, split=0, dtype=ht.int, device='gpu')
conv = ht.convolve(dis_signal, dis_kernel_odd, mode='full')

Error message or erroneous outcome

$ CUDA_VISIBLE_DEVICES=0,1,2,3 srun --ntasks=2 -l python test.py 
1:Traceback (most recent call last):
1:   File ".../test.py", line 7, in <module>
1:     conv = ht.convolve(dis_signal, dis_kernel_odd, mode='full')
1:   File ".../heat-venv_2023/lib/python3.10/site-packages/heat/core/signal.py", line 161, in convolve
1:     local_signal_filtered = fc.conv1d(signal, t_v1)
1: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__cudnn_convolution)

Version

main (development branch)

Python version

3.10

PyTorch version

1.12

MPI version

OpenMPI 4.1.4
mtar commented 1 year ago

https://github.com/helmholtz-analytics/heat/blob/8597417f274af2cdc6cd522f2bcbe1b3e6a21a08/heat/core/signal.py#L159

This is causing the issue. Unlike our own implementation, it keeps the device between processes.

shahpratham commented 1 year ago

@mtar should I use 'Bcast' then, as mentioned here in #790?

mtar commented 1 year ago

@mtar should I use 'Bcast' then, as mentioned here in #790?

Yes, that should fix it.