Open ClaudiaComito opened 1 month ago
Our tests on the AMD-ROCm runner have been failing at test_random, on the 2-process GPU tests.
test_random
Failure corresponds to one of the many dndarray.numpy() calls, in turn calling Allgather or Allgatherv.
dndarray.numpy()
Allgather
Allgatherv
No response
main (development branch)
None
What happened?
Our tests on the AMD-ROCm runner have been failing at
test_random
, on the 2-process GPU tests.Failure corresponds to one of the many
dndarray.numpy()
calls, in turn callingAllgather
orAllgatherv
.Code snippet triggering the error
No response
Error message or erroneous outcome
No response
Version
main (development branch)
Python version
None
PyTorch version
None
MPI version
No response