insilichem / ommprotocol

A command line application to launch molecular dynamics simulations with OpenMM
http://ommprotocol.readthedocs.io
GNU Lesser General Public License v3.0
39 stars 8 forks source link

Automatic restart of simulation #10

Open ajasja opened 6 years ago

ajasja commented 6 years ago

Hi!

I like the idea of the package very much (I do something similar currently with a bunch of bash files and NAMD, which is a nightmare to maintain/extend).

In the case where trajectory_new_every is set (for example every 1 ns), and the simulation gets interrupted after 15 ns, is there an option to continue from the 15 th ns? In other words, does ommprotocol know where it has to start etc...

jaimergp commented 6 years ago

Hi! Thanks for your nice comments :)

trajectory_new_every only splits DCD files every ns (or the value you chose) so you don't end up with huge trajectory files that can get corrupted more easily. What you are trying to find is these parameters:

, which can be configured like this:

restart: rs
restart_every: 1e6
save_state_at_end: True

This will create a *.restart file every nanosecond and, additionally, after every stage, a *.state.xml file will be created. *.restart files follow the netcdf binary format, while the xml ones are dumped directly from the OpenMM objects in plain text. The ParmEd wiki has more info on how to set restarts in OpenMM and I tried to provide those options in ommprotocol as well.

You can check all the builtin reporters in io.py.

ajasja commented 6 years ago

Looking forward to testing ommprotocol ;) I guess my question was, if I do

ommprotocol <path/to/yaml> for an interrupted simulation, will it automatically restart and run from the restart file, or will it run from the beginning.

jaimergp commented 6 years ago

I guess my question was, if I do ommprotocol <path/to/yaml> for an interrupted simulation, will it automatically restart and run from the restart file, or will it run from the beginning.

It will run from the beginning but won't overwrite any file. Copies will be created with numeric suffixes (output_production.1.dcd, and so on).

My intention is to be explicit about what is running and why, so I chose that the input file contained a reproducible environment at all times. That means that magic behaviour like that, while useful, should be avoided in honor to reproducibility. Anyway, that's just the philosophy behind the protocol thing.

In practice, all this means is that, to restart a trajectory, you have to use that as source of either topology, coordinates, or whatever, and edit the stages to continue where you left. For example, if the MD crashed at 12ns because there was a power shortage, you should take the latest restart available and use that as the checkpoint key. Then, remove all the stages you successfully completed and edit the steps value in the production stage to get to the same timescale you originally intended.

I know it sounds like a hassle, so as I am writing these lines I am thinking of trying to combine the best of both worlds, like automating the restart but with a big notice at the beginning of the logfile. After all, we should have all the information we need in that directory. I'll look into it, but it will take a few weeks, because currently I am focusing on other tools that will be released soon.

Thanks for the feedback!