lanl / BEE

Other
13 stars 3 forks source link

Fix issues with leaking gdb processes #738

Closed aquan9 closed 7 months ago

aquan9 commented 8 months ago

Currently we are aware of two pathways to a leaked gdb process: A failed workflow, and a cancelled workflow.

Previously, failed workflows were detected in the workflow manager, but left unhandled. Previously, cancelled workflows were simply deleted from the database.

Now, cancelled workflows are marked as cancelled, in the db, and so evidence of the workflow's existence remains when you query beeflow list. In processing the workflow cancellation we also cleanly stop the gdb now.

The case of the failed workflow was also handled, and is now cleanly archiving and stopping the gdb upon failure.

One thing to note for a future issue, is that both completed workflows and failed workflows results in a final "archived" state. This can be misleading for the user. I'll go ahead and make an issue for this.

Should fix #730 I also went ahead and added an example workflow that guarantees a workflow failure (via an invalid command), for the purpose of testing.

pagrubel commented 8 months ago

Please fix the failing unit test.