When an experiment fails, e.g. due to a full disk or unavailable SSH materials, there is no easy way to restart it.
Several steps for reproducibility are already in place: seeded variable values exploration, copy of experiment and cluster file to the experiment results directory.
mpf.run_experiment should be modified to yield partial DataFrame with intermediate results. Then, the last saved DataFrame could be passed again to mpf.run_experiment along with the experiment id to resume it.
When an experiment fails, e.g. due to a full disk or unavailable SSH materials, there is no easy way to restart it.
Several steps for reproducibility are already in place: seeded variable values exploration, copy of experiment and cluster file to the experiment results directory.
mpf.run_experiment
should be modified to yield partial DataFrame with intermediate results. Then, the last saved DataFrame could be passed again tompf.run_experiment
along with the experiment id to resume it.