binary reproducible trajectories

Whitford commented 3 years ago

Since openMM doesn't rely on domain decomposition, which can lead to randomness when a thread is running slowly, it seems like openMM trajectories might be binary reproducible. That is, if you run on the same hardware, and you give it the same random seed, coordinates and velocities, it seems like you should get an identical trajectory. A quick google search didn't return information about this with openMM, but I'll keep looking. As an initial test, we should just launch the same job with the same random seed and see if it yields the same trajectory. If it does, then we should be able to write trajectories with velocities (machine precision) and random seeds, so that we can fill in the gaps.

My expectation is that this will not work, but I figured I would put it out there for discussion.

Whitford commented 3 years ago

Ok, I found my answer.

http://docs.openmm.org/7.1.0/api-python/generated/simtk.openmm.app.simulation.Simulation.html

simulation.savecheckpoint - hardware specific -binary reproducible simulation.savestate - more versatile - not binary reproducible.

checkpoint could be used to save a snapshot every N minutes. Perhaps it could be useful for large systems to only write out one checkpoint per hour, or something like that. Then, it won't require much additional data to be saved, but the frames can be regenerated. We would need to have a reasonable naming convention.

I am leaving this question open, since I think it would be good to have checkpointing options included in openSMOG. One way would be to have a checkpoint file directory. Then, one could have many checkpoints written, and they wouldn't clutter anything. We could also have an open for finding the newest checkpoint. For example, if the checkpoint dir is $curdir/checkpoint, and the files are called check..xml, where is the current timestep, then the SBM.startfromcheckpoint would take $curdir/checkpoint as an argument. It would then look at see which xml file is for the latest timestep and use it.

I think this could all be wrapped within the reporter routines that we already have implemented. Since the reporter writes data every N steps, at each write it can check if S minutes have passed since the last checkpoint write. If it has, then the checkpoint file is written and the checkpoint time is restarted. While this will not save data exactly every S minutes, it would be close enough. It would also be less overhead, since would only check the time every N steps.

Whitford commented 2 years ago

we have updated SBM.help() to describe how to do this. that is enough

smog-server / OpenSMOG

binary reproducible trajectories #50