Open sentry-io[bot] opened 6 months ago
The increase of the prewarming pool size for environment 10 was performed at 2024-05-02T08:13:18.776145Z (from 5 to 15).
Why were runners lost after the Nomad agent restarts?
Here, 3 runners get lost. ```log ,,0,2024-04-26T11:35:25.279073568Z,4,deletion,10-064a2c21-03c1-11ef-b832-fa163e7afdf8 -> Nomad agent restart. Never replaced ,,0,2024-04-26T11:35:25.279073568Z,3,deletion,10-c1b06151-03c0-11ef-b832-fa163e7afdf8 -> Nomad agent restart. Never replaced ,,0,2024-04-26T11:35:26.285209424Z,2,deletion,10-b4a430fa-03c0-11ef-b832-fa163e7afdf8 -> being used #1 ,,0,2024-04-26T11:35:26.285209424Z,3,creation,10-0331c7d8-03c1-11ef-b832-fa163e7afdf8 -> RACE CONDITION. Creation before deletion. Why is the element count not 2? ,,0,2024-04-26T11:35:26.285209424Z,1,deletion,10-0331c7d8-03c1-11ef-b832-fa163e7afdf8 -> Deletion falsely after the creation. ```
We see that one runner is lost due to bug #602. Two other runners are lost within the Nomad restart/deployment.
We see that the Job is started one time correctly. After the Nomad restart it tries two further times, but fails due to an unknown reason. Then, our configured limit of 3 attempts is reached.
`topic: Job`
```log
# Starting
,,0,map[Job:map[Affinities:
Sentry Issue: POSEIDON-5H
In this event, environment 10 (java-8) was reloaded. The event happened in the context of a deployment.
Assumed issues: