Open ayushr2 opened 4 years ago
There is one scenario in the websocket version of the protocol that's currently problematic:
When API restarts, graders will be interrupted with a disconnection exception. If there are ongoing jobs in graders in the restarting period, API will think the grader is still working on the original job (since in the http protocol graders would still continue the job and submit in this case). The "running" flag of such grader is not cleared in API once the restart is done.
I think we should probably:
Nice idea. We would need to drain the queue before we can shutdown the API.
We should test that when a node reconnects, their info is sustained and they can continue from where they left.