lanl / BEE

Other
14 stars 3 forks source link

Task Manager Resiliency: Rebuild the submit and job queues when the task manager comes back up. #674

Closed aquan9 closed 8 months ago

aquan9 commented 1 year ago

Pieces broken up from #614

jtronge commented 8 months ago

Does this already happen on restart? It looks like the task manager is just loading up the old database file, which should still have the submit and job queue data, unless the whole database gets deleted.

jtronge commented 8 months ago

I did some testing with examples/clamr-ffmpeg-build and it looks like the task manager is able to recover the submit/job queues after failures and successfully update task states. I'm going to close this for now.