Code hangs in MPIBuildParticleExportListUsingMesh when called on the fly from Swift

jchelly commented 4 years ago

I'm trying to run a 2256^3 dark matter only simulation with Swift using velociraptor on the fly. I'm finding that the code sometimes hangs in the function MPIBuildParticleExportListUsingMesh() when velociraptor is called. In my most recent run, the first two velociraptor calls completed but the third got as far as reporting "now building exported particle list for FOF search" then produced no further output for about 10 hours.

From the log I think all of the processes had entered MPIBuildParticleExportListUsingMesh() before they got stuck - I'm running on 48 nodes and there were 144 instances of the "now building exported particle" message in the log.

I'm running commit 8da1f94dae758a86d9ec0589c4cb2426c3dd23ec from the master branch and it's configured with

module purge
module load intel_comp/2018 intel_mpi/2018
module load parallel_hdf5/1.10.3 gsl/2.4 parmetis/4.0.3
module load gsl/2.4
module load cmake

cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CXX_FLAGS_RELEASE="-O3 -xAVX -ip -DNDEBUG" \
    -DCMAKE_C_FLAGS_RELEASE="-O3 -xAVX -ip -DNDEBUG" \
    -DCMAKE_C_COMPILER=icc \
    -DCMAKE_CXX_COMPILER=icpc \
    -DVR_USE_SWIFT_INTERFACE=ON \
    -DVR_USE_GAS=ON \
    -DVR_NO_MASS=ON

The config file is the same as examples/sample_swiftdm_3dfof_subhalo.cfg from the Velociraptor repository, except that I set MPI_number_of_tasks_per_write to a large value.

pelahi commented 4 years ago

Given you were running on 48 nodes, do you expect 144 instances of now building ... Ie are there 144 MPI processes?

jchelly commented 4 years ago

Sorry, that was a mistake. I'm using 48 MPI processes on 24 nodes so I think 144 instances of the message is correct if velociraptor has run three times.

pelahi commented 4 years ago

Can you point to the log file (on cosma I presume)? The hang would be because presumably and MPI send/recv didn't complete

jchelly commented 4 years ago

The log is in /cosma7/data/dp004/jch/EAGLE-XL/DMONLY/Cosma7/L0150N2256/tests/default/logs/L0150N2256.1879257.out .

jchelly commented 4 years ago

I've updated velociraptor and resubmitted the job in case any of your recent changes help with this.

jchelly commented 4 years ago

I've been able to reproduce this with the latest velociraptor master running in ddt. At the time of the hang all of the MPI ranks are in MPIBuildParticleExportListUsingMesh. Some of them are waiting at MPI_Sendrecv calls and some are waiting for MPI_Recvs. Unfortunately I had a typo in my cmake config so I don't have debugging symbols in this run. I'll have to restart it to get more information.

jchelly commented 4 years ago

I think the problem here is that different ranks disagree about whether communications need to be split up. Adding

MPI_Allreduce(MPI_IN_PLACE, &bufferFlag, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);

just after bufferflag is calculated might help.

jchelly commented 4 years ago

Seems to be fixed by #75.

pelahi / VELOCIraptor-STF

Code hangs in MPIBuildParticleExportListUsingMesh when called on the fly from Swift #72