Closed karya0 closed 1 year ago
Also, can you try make clean and make?
I am ensuring that MANA_RestartDir is being set.
I'm looking right now at:
setenv(key.c_str(), ckptImages[i].c_str(), 1);
in dmtcp_restart_plugin.cpp
I think that the problem is there. But I'm still testing.
Yep. That's the root cause. In:
for (size_t i = 0; i < ckptImages.size(); i++)
I'm seeing: ckptImages.size() == 0
@karya0 , Okay. I fixed some more bugs with the original PR. I pushed this in, as a second commit.
On the DMTCP side, I've also reverted the "Set "stale timeout" (secs), when no peer processes
commit. I needed to revert that commit, or else we don't get a checkpoint image after launching using mana_launch -i5 ...
(And if you revert it, don't do git submodule update
after that, or it will undo the revert.)
On that basis, the code now seems to work. But I will revert the dev/gdc0/simplifyCopyBits branch that we're using for testing. This code in this PR is still too new
Jenkins reports:
15:15:17 + git submodule update --init
15:15:17 Submodule 'dmtcp' (https://github.com/dmtcp/dmtcp) registered for path 'dmtcp'
15:15:17 Cloning into 'dmtcp'...
15:15:22 fatal: reference is not a tree: afc5b3c78594f0f12ece4b65d5e6eeb65f8591a0
15:15:22 Unable to checkout 'afc5b3c78594f0f12ece4b65d5e6eeb65f8591a0' in submodule path 'dmtcp'
15:15:22 + ./configure
Which branch is this using for DMTCP? Could it be that it's using a branch that's not part of the origin repo for DMTCP?
This branch and PR are irrelevant now. Closing.
This uses the newly proposed environment variable-based mechanism.