Open slonka opened 3 months ago
Triage: this assumes that exit code is different for leader election: needs checking
I think the best we can do here, if we want to be safe, is if restartCount: 1
and lastState.terminated
has exitCode: <leader election lost code>
, after my change to exit with 0 on leader lost, then we can ignore it. Unfortunately because only the last termination is kept, if restartCount > 1
then we don't know if it may have exited with an error.
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
Description
Some time ago we introduced a restart counter in our E2E tests to check if the CP does not restart (theoretically if everything is fine it shouldn't and a restart could indicate some problem with the CP like OOM). This works fine in general but leader election on k8s kills the CP and that causes the CI to "fail".
One idea (from @michaelbeaumont) is to distinguish these restarts by exit code and filter them out.