Closed jchelly closed 4 years ago
Hi John, how are the mpi domains being set in swift? Is it using metis? I haven't encountered the same issue but have just used swift with a max of 16 mpi domains in current tests. If there is an issue with opt.cellnodeids it suggests that there is something amiss with how they are set in swift and passed to vr.
I'm using parmetis for these runs. I have it running in ddt now so I'll try to see where this bad value of cellnodeids is coming from.
I think I've found the problem. When I run with VR_NO_MASS the mass element is omitted from Velociraptor's struct swift_vel_part in SwiftParticle.h. But on the Swift side there's no #ifdef and we don't know about VR_NO_MASS anyway so Swift's struct swift_vel_part in swift_velociraptor_part.h still contains a mass entry.
Could we just always leave the mass entry in the struct whether it's used or not? Or should swift know about velociraptors configuration somehow?
Happy either way. We can discuss it with Pascal tomorrow morning.
I'll just comment out the #ifdef for now and see if that gets the 300Mpc box going again.
Looks like there are some fixes needed on the Swift side too. The siminfo struct doesn't match between swift and velociraptor in the latest master branches and I think the particle mass is not set anywhere with VR_NO_MASS=ON because the mass_uniform_box entry in siminfo doesn't exist in swift.
It seems to get further with the following changes:
I'll make a pull request with a tidied up version of these fixes. Probably no need to modify Swift after all.
I have pushed the branch VR_detection in SWIFT to fix the issue. The latest VR master exposes functions in the libary depending on how it was compiled that the SWIFT configure script uses to decide whether to compile the SWIFT-VR interface in NOMASS mode or not.
For Matthieu's fix to work we need this change too - #68.
This seems to be fixed now.
I'm trying to run a 300Mpc DMONLY box using Swift with on-the-fly Velociraptor. I'm using revision 5c22b31faa189beddd639d24a94998fa34e41cd0 of Velociraptor with the sample_swiftdm_3dfof_subhalo.cfg configuration from the repository. I've configured it with
It crashes with a segmentation fault after this message:
The problem occurs in the function MPIGetCellListInSearchUsingMesh() on this line:
The expression opt.cellnodeids[index] evaluates to an invalid address, which causes a segmentation fault. The variable index is zero when it crashes. This first happened on the 300Mpc box running on Irene but I can reproduce it on a 75Mpc box on Cosma.