Closed eirrgang closed 6 years ago
There does appear to be a race condition due to vague semantics about closing a session. A pull request is in preparation that may resolve the immediate workflow issue. It involves clarifying that closing a gmxapi Session object semantically maps to destroying any associated gmx::Mdrunner object(s). Pending further testing, the pull request will be submitted against the gromacs-gmxapi repository. If that's all fine, we can remove the "bug" label from this issue and leave it as a collection of tasks for an enhancement.
The tasks described above should be reconciled with and merged into #79
It also seems like a combination of resolutions to #124 and #90 may have preempted this issue. This issue probably needs to be broken up into small tasks that are individually resolved in order to support https://github.com/kassonlab/gmxapi/pull/126 , which needs an issue at this point I think.
The immediate bugs appear resolved. The remaining discussion has been moved to #79
There may be a race condition somewhere that prevents us from using the checkpointed working directory to fork a simulation trajectory. To start from a checkpoint requires all output files to be intact and to match a checksum in the checkpoint file, but it has been reported that copying the md.log programmatically can miss the last few lines when using the Python shutil shell wrapper module to perform a file copy immediately after a simulation ends. (Note, the script that reproducibly exhibits the error still needs closer examination to confirm that the simulation has been completely shut down.)
There are not complete and comprehensive GROMACS tools to deal with this situation, so I probably need to write them, but with enough of the original files we should be able to generate a new input file for the forked run. Note that we should confirm that the checkpoint file used matches the step number that we think it should.
Several related issues to consider:
GNU filesystem utilities indicate that the process's current working directory is used to resolve paths to produce a file descriptor for
fopen()
, but it is unclear whether the semantics are universally well-defined for what happens if the current working directory is changed while a file descriptor is held for a file opened by a relative path.We should specify all input and output files rather than rely on libgromacs default behavior.
We should avoid ambiguity by making sure that we pass absolute paths to libgromacs.
We should cease the practice of changing working directory during Session launch (with the possible exception of dispatching to another Context, which should be done in a separate process).
The Context implementation should handle shuffling of filesystem artifacts for such use cases as forking trajectories.
Independently of further discussions about input and output paradigms, we can achieve predictable behavior in the short term by distinguishing between a trajectory that is a continuation and a trajectory that is initialized as a fork of another trajectory.
As a further simplification of the last point, we can accept that our trajectory forking operation will be a freshly initialized simulation whose zeroth MD microstate is not exactly equal to that from which it was forked. It's time to write some utilities to extract / manipulate input components because right now I don't think there is a way to get a topology that
grompp
can use back out of a TPR file. For a proof-of-concept trajectory forking, we can wrap the following.where
old.tpr
is available from the originalload_tpr
operation,temp.mdp
is a temporary file managed by the Context implementation,topol.top
can be provided as a parameter to thefork_trajectory()
operation,state.cpt
is already managed by the C++ Session, andnew.tpr
is an output that becomes the input for the forkedmd
operation.fork_trajectory()
operation (proof-of-concept wraps command line)