Automatic "canceled" experiment status detection in protocol db

As of now experiments which where aborted/terminated before complication are listed as running in the status variable of the pickledb instance. Given that the DB doesn't directly receive the info of the termination, it would be good to have some form of check/mechanism that sets the status to aborted or canceled in this case.

One potential way of doing so, would be via time constraints of experiments. For example if the user provides a time limit per job, we could calculate the maximum time for the entire experiment. If this time is exceeded and the status of the experiment is still running, we should change it. This could be checked everytime the db instance is loaded. The steps would look as follows:

Start-up experiment with time_per_job: "dd:hh:mm" given in single_job_args.
Calculate total time for experiment as total_experiment_time = num_search_batches * time_per_job.
Add maximum of job runtime to the db as max_completion_time = start_time + total_experiment_time.
Whenever the db is loaded, check if current_time > max_completion_time and status == "running". If so - set status to aborted.

mle-infrastructure / mle-toolbox

Automatic "canceled" experiment status detection in protocol db #22