Multi-GPU staggered Dslash hanging on 8/16 GPUs

lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.

https://lattice.github.io/quda

Other

286 stars 94 forks source link

Multi-GPU staggered Dslash hanging on 8/16 GPUs #100

Closed jpfoley closed 11 years ago

jpfoley commented 11 years ago

I just noticed this problem on Blue Waters yesterday when I was testing the MPI build. staggered_dslash_test and staggered_invert_test run fine on 4 GPUs, but hang in tests involving 8 and 16 GPUs. The bug was introduced in one of the commits of November 27 and 28. The code in the master branch worked fine before that. Blue Waters is down for maintenance today, but I will check whether the same problem occurs in the QMP build once it's back up.

maddyscientist commented 11 years ago

When going from 4 to 8 GPUs, was there anything else that changed? I am curious if the 4 to 8 GPU transition is when another dimension is partitioned.

jpfoley commented 11 years ago

I had partitioned the z and t directions. (1,1,2,2) runs but (1,1,2,4) doesn't. I did run on a larger lattice on 8 GPUs, however.

On 01/23/2013 10:39 AM, mikeaclark wrote:

When going from 4 to 8 GPUs, was there anything else that changed? I am curious if the 4 to 8 GPU transition is when another dimension is partitioned.

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/100#issuecomment-12611853.

maddyscientist commented 11 years ago

Ok, it looks like the partitioning isn't the issue then. I'll take a look later today.

alexstrel commented 11 years ago

Had the same problem for multi-node execution. Single node (i.e., 2 GPUs in my case) seemed to be ok.

maddyscientist commented 11 years ago

This problem is not to do with number of nodes, rather it only seems to occur when the number grid size is 4 or greater. E.g., (1, 2, 2, 2) runs, but (1,1,1,4) does not run. Continuing to investigate.