metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
23 stars 2 forks source link

Consider fail-safe for checking on terminated models #688

Open seth127 opened 4 months ago

seth127 commented 4 months ago

In https://github.com/metrumresearchgroup/bbr/issues/685#issuecomment-2101744139 @kyleam pointed out numerous situations where a model process can be terminated, but not write out either bbi_config.json or write "Stop Time:" to the .lst file.

To address these, we could potentially build in a fail-safe where we hash OUTPUT files every 10 iterations or so, and “consider dead” if a process hasn’t written to it in <N> iterations. This might be kinda hacky and extreme, but we may want to add it if we’re worried about enough cases where a model stops without writing a bbi_config.json (which could lead to endlessly checking back, thinking it's still running). Related thoughts:

kyleam commented 4 months ago

My initial thought is that I see the motivation, but I'm worried that "every N iterations" may not be calibrated right to account for complicated models that take more time than expected to move (perhaps not as much of a worry in the context of bootstrap?). I suppose conceptually it's just inherently racy, which makes me nervous, but I know you're saying it might be good enough and do more good than harm in practice.

Anyway, my main comment is that...

fail-safe where we hash OUTPUT files every 10 iterations or so, and “consider dead” if a process hasn’t written to it in <N> iterations

... for this use case (a file that's just being appended to), I think you could just stat the file and look at whether the size has changed (i.e. file.size() from R). That'll be sufficient for the check and more efficient than hashing the content.

kyleam commented 4 months ago

and then bbi tried to write its own bbi_config.json later… what would happen?

Based on this test, bbi would just overwrite the file:

bbi nonmem run local 1.ctl &
sleep 0.5
echo alien >1/bbi_config.json
echo 'alien placed; exiting'
seth127 commented 4 months ago

Based on this test, bbi would just overwrite the file

And... do we think this is fine? I'm leaning towards yes, because it would basically be

Do you agree with that assessment?

kyleam commented 4 months ago

@seth127 Yes, that sounds reasonable to me.