lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
286 stars 94 forks source link

FaceBuffer::exchangeCpuLink QMP_waits forever #53

Closed fwinter closed 12 years ago

fwinter commented 12 years ago

Hi,

running on 2 hosts with 1 mpi node per host does not work for me. Whereas running on 1 host with 2 mpi nodes works fine. I tracked it down to

face_qmp.cpp: void FaceBuffer::exchangeCpuLink(void* ghost_link, void* link_sendbuf)

where the statement

for (int i=0; i<4; i++) { QMP_wait(mh_send_fwd[i]); QMP_wait(mh_from_back[i]); }

never completes. It just waits forever.

I use a QMP enabled build. Machine type: 4xGTX480 (but hostfile has just 1 slot per host). Quda commit is 5c24b4556fc7f374bfd

As I said, running the same QMP build on 1 host with 2 slots per host just runs fine.

Any ideas?

gshi commented 12 years ago

Frank, Can you try and narrow down to which commit it breaks? The commit you tried has bunch of MPI related changes but it should not affect qmp code path.

-Guochun

fwinter commented 12 years ago

Okay, current master fixes this.