helmholtz-analytics / heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
https://heat.readthedocs.io/
MIT License
211 stars 54 forks source link

[Bug]: Refactor `DNDarray.get_halo` #1570

Open ClaudiaComito opened 3 months ago

ClaudiaComito commented 3 months ago

What happened?

Notes from the PR meeting this morning (subject to change, feel free to edit):

See #1419

Code snippet triggering the error

No response

Error message or erroneous outcome

No response

Version

main (development branch)

Python version

None

PyTorch version

None

MPI version

No response

FOsterfeld commented 3 months ago

I also observed the following behavior ofget_halo():

If we call get_halo() then do a resplit_ that invalidates the saved halo and then call get_halo(0), the saved halo from the first call does not get reset and thus array_with_halos() would throw an error due to invalid shapes. In general, calling get_calo(0) does not remove the saved halos as one might expect and this might lead to errors later on.

mrfh92 commented 3 months ago

an idea brought up by @JuanPedroGHM: use Sendrecv

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 60 days with no activity.