pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
249 stars 76 forks source link

Checkpointing #23

Closed mwibral closed 3 years ago

mwibral commented 5 years ago

HPC cluster with queuing systems often set time limits that are too tight for longer running mTE-IDTxl analyses. If these analyses would occasionally write checkpoint-files containing the neccessary information to resume the computations and to the necessary information to finally store the results where they belong, i.e. the desired results file name ('DRFN'), then just about any queue limits would do.

Information neccessary for succesfully resuming and finalizing the computation

Handling of checkpoint files: Problem: One does not want 50 versioned checkpoint files for a single analysis cluttering the disk. So once a new checkpoint file is succesfully written, the old one should be deleted. However, if the write fails while writing the new checkpoint file (because the process is killed by the queue-management system), then there should be a way to recover from this situation.

I would suggest to: (1) move the old checkpoint file (if any) to .ckp.old, (2) start writing the new checkpoint file, (3) If (2) was successful, remove the .ckp.old file. This ensures we'll always have a way to recover.

pwollstadt commented 3 years ago

Added in in dc2dd9a5b12695c1754aa0e496aeac65df1c7da0