In a recent deployment, we have observed that some (but not all) runners are lost when all Nomad agents restart.
Within this issue, we should identify the Nomad event that notifies Poseidon that a job is lost and will not be restarted nor rescheduled, and deal with it by trying to request a new runner. [Jobs][Allocations].
Related to #587
In a recent deployment, we have observed that some (but not all) runners are lost when all Nomad agents restart.
Within this issue, we should identify the Nomad event that notifies Poseidon that a job is lost and will not be restarted nor rescheduled, and deal with it by trying to request a new runner. [Jobs] [Allocations].
This should be fixed together with #602