pelahi / VELOCIraptor-STF

Galaxy/(sub)Halo finder for N-body simulations
MIT License
19 stars 26 forks source link

Segmentation fault in GetSOMasses() in dmonly run #44

Closed jchelly closed 5 years ago

jchelly commented 5 years ago

I'm running velociraptor on the fly as part of a simulation using swift. The simulation is a 75Mpc EAGLE-XL dark matter only box with 1128^3 particles. The first time it invokes velociraptor it crashes at line 3066 of substructureproperties.cxx:

                AddDataToRadialBinInclusive(opt, radii[indices[j]], masses[indices[j]],
#if defined(GASON) || defined(STARON) || defined(BHON)
                    sfrval, typeparts[indices[j]],
#endif
                    irnorm, ibin, pdata[i]);

I think the problem here might be the vector typeparts (I'm not certain because this was an optimized build so I couldn't see most variable values in the debugger). The condition for typeparts to be initialized to a particular size is not the same as the condition for it to be assigned on line 3066, so there are combinations of preprocessor macros and options that will cause an out of bounds array access.

In this case I have GASON defined because the swift interface wont compile without it. I'm using the sample_swiftdm_3dfof_subhalo.cfg config from the repository.

jchelly commented 5 years ago

As a quick test and/or workaround I'll try switching off the profile calculation. I think that should avoid the problem.

jchelly commented 5 years ago

The original crash is now fixed in master but I'm getting a segfault in MPIGetHaloSearchExportNumUsingMesh. In a two MPI rank run it stops at lines 1765 and 1767 of mpiroutines.cxx. This might mean the value of cellnodeID is out of range.

pelahi commented 5 years ago

Hi @jchelly, can you provide any more details?

jchelly commented 5 years ago

I'm trying to get some more information out of ddt now, but exactly what happens seems to depend on the optimization level. I should say this is not the same simulation - I've switched to the EAGLE 25Mpc z=0.1 ICs so that I can reproduce the crash more quickly. This one crashes within a few minutes.

jchelly commented 5 years ago

After a bit more investigation it appears that the crash in MPIGetHaloSearchExportNumUsingMesh only happens if I enable inter procedural optimization. This is using the Intel 2018 compiler. If I build swift with IPO but build velociraptor without IPO my small test run doesn't crash.

pelahi commented 5 years ago

Should I leave this issue open or close it? The -ipo flag is not going to be a often invoked flag.

jchelly commented 5 years ago

From the tests you reported at the telecon it sounds like it's probably not a velociraptor bug, so let's close it.