mpickpt / mana

MANA for MPI
35 stars 24 forks source link

Don't skip munmap of mtcp_restart regions. #353

Open karya0 opened 1 year ago

gc00 commented 1 year ago

@karya0 , This issue is a blocker for MANA development. I hope you can get back to it soon. Best,

karya0 commented 1 year ago

@gc00 : This PR is insufficient for the fix. The problem lies in how lower-half/lh-proxy are accounting "core" vs rest of the regions. The current logic in the split process considers all areas until [heap] as core regions and refuses to munmap them. This includes the mtcp_restart region as well.

Further, the upper-half plugin, mpi_plugin.cpp, logic incorrectly labels the heap created by the new lh-proxy process as part of the upper half and saves it as part of checkpoint. That's why heap also sees a conflict on second restart.

We need to come up with a proper fix to handle both cases. This PR can plaster over the mtcp_restart conflict but not heap.

jiamingz9925 commented 1 year ago

@karya0 @gc00 I can try to do some experiment in my forked repo and based on this PR as well

gc00 commented 1 year ago

See PR #357 for the continuation of this analysis. We should probably close this PR without committing

gc00 commented 1 year ago

@karya0 , If this PR #353 is now obsolete, can you close it?