lanl / PyBNF

An application for parameterization of biological models available in SBML and BNGL formats. Features include parallelization, metaheuristic optimization algorithms, and an adaptive Markov chain Monte Carlo (MCMC) sampling algorithm.
Other
22 stars 18 forks source link

Catch job errors version 2.1 #240

Closed emitra17 closed 5 years ago

emitra17 commented 5 years ago

Latest attempt at a job submission workflow that can recover if the job throws an exception.

I am currently testing stability with a long-running bootstrap run on ulk.

emitra17 commented 5 years ago

This has survived a run on ulk much longer than my attempt with the old version, but I can't really tell why. The log doesn't show any times that the new error catch saved the run.

The ulk run eventually died after about 365 replicates, apparently from garbage collection warnings. Confirmed that the garbage collection warning is not unique to this branch.

emitra17 commented 5 years ago

Although I haven't yet seen this save us in a non-synthetic situation in testing, I'm in favor of merging as is because it doesn't seem to break anything, and in principle it should help if some unexpected exception came up.