Open karya0 opened 1 year ago
@gc00 : This PR is insufficient for the fix. The problem lies in how lower-half/lh-proxy are accounting "core" vs rest of the regions. The current logic in the split process considers all areas until [heap]
as core regions and refuses to munmap them. This includes the mtcp_restart region as well.
Further, the upper-half plugin, mpi_plugin.cpp, logic incorrectly labels the heap created by the new lh-proxy process as part of the upper half and saves it as part of checkpoint. That's why heap also sees a conflict on second restart.
We need to come up with a proper fix to handle both cases. This PR can plaster over the mtcp_restart conflict but not heap.
@karya0 @gc00 I can try to do some experiment in my forked repo and based on this PR as well
See PR #357 for the continuation of this analysis. We should probably close this PR without committing
@karya0 , If this PR #353 is now obsolete, can you close it?
@karya0 , This issue is a blocker for MANA development. I hope you can get back to it soon. Best,