@xeniape ran into an issue (sble employees: see slack) where pods would be left with expired certificates after a while, rather than getting evicted by commons-op as expected. Restarting commons-op evicted the pods, as expected.
Our current working hypothesis here is that commons-op's re-reconciliation timer didn't advance while the computer was suspended, causing the eviction to be delayed by the same amount of time.
Possible solution
Either:
Change the timer to use wall time instead of monotonic/CPU time
Cap the re-reconciliation timer, causing spurious reconciles but at least limiting the issue
Make the timer automatically expire when resuming from suspend
Either way, we should probably also communicate upstream with kube-rs and either fix it there or highlight the issue somehow.
Affected Stackable version
dev (24.11 prerelease)
Current and expected behavior
@xeniape ran into an issue (sble employees: see slack) where pods would be left with expired certificates after a while, rather than getting evicted by commons-op as expected. Restarting commons-op evicted the pods, as expected.
Our current working hypothesis here is that commons-op's re-reconciliation timer didn't advance while the computer was suspended, causing the eviction to be delayed by the same amount of time.
Possible solution
Either:
Either way, we should probably also communicate upstream with kube-rs and either fix it there or highlight the issue somehow.
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
None