Closed koomie closed 2 years ago
Is this specific to 8-gpu case, or does it fail on different number of gpus?
It is sensitive to number of gpus. I believe this case runs ok with 4 gpus, but not 16.
I haven't been able to reproduce this behavior.
I tried with the input file given above using the current HEAD
on main
(9dd74fa) as well as the version merged on June 9 (7ce3582), when the problem was reported, using both 8 gpus and 16 gpus. All combinations ran to completion (50 steps) without any problem. I have also run other multi-gpu (up to 128) + sponge zone cases recently without incident.
So... I'm going to close this. Of course, will reopen if it pops up again.
With spongezone enabled, have encountered a situation where the current GPU code fails in certain cases.
Reproducer input file
@trevilo has a copy of the mesh file referenced above. When using the above input, a case using 8 MPI ranks (8 gpu) will fail on Lassen. If you comment out the sponge zone related inputs, it will run fine.