materialsproject / fireworks

The Fireworks Workflow Management Repo.
https://materialsproject.github.io/fireworks
Other
361 stars 185 forks source link

Recover offline revisited #338

Closed jotelha closed 5 years ago

jotelha commented 5 years ago

This pull request addresses two related issues discussed at the google group:

  1. When fizzling a "lost run", check whether it has been an offline run. If so, forget offline run (i.e. mark as "deprecated")
  2. When recovering offline runs, always use the ping time found in "FW_ping.json" as the "updated_on" time in the launch's state history. Previously, subsequent touch_history commands and database updates used to always override the ping time with the current date. That led to "detect_lostruns" not being able to identify lost offline runs.

This is a suggestion that solves those two particular issues. However, the "checkpoint" update has been moved up, now occurring either via

a) the ping_launch(...) call, which already offers the possibility to pass on a checkpoint, if a ping file exists, or b) directly via touch_history, just as before, but only in the case of no ping file being present.

Afterwards, the launch is queried again in order to avoid overriding the possibly updated state history in any subsequent modification of the launches collection.

This is not a beautiful solution, rather a quick restructuring solving the two addressed issued, touching only as much code as necessary. However, there might be other issues arising.

In the code, I left some open questions as comments.

Best regards,

Johannes

coveralls commented 5 years ago

Coverage Status

Coverage decreased (-0.04%) to 60.542% when pulling 1f524c44306d395c2b2f51b7fed113c6bc380b0d on jotelha:recoverOfflineRevisited into d970893acb9f1178cdc025d912fa363b56738cc9 on materialsproject:master.

computron commented 5 years ago

I am closing this PR because the requested changes are incorporated via: 51e7d62c434238728e7686245d39b50d06ede717 f38987867781602da0bee5a05f601774fe4a3dd4

Thanks for the help!