Open floreks opened 4 days ago
What might be problematic with this approach is detecting if the controller is still running or not. Heartbeat in this case is the last poll time. Since we have information about how often polling should be executed, we can calculate the time difference between last poll time and current time to see if controller could be dead.
Recovering from panic technically does not help us much since if it will panic the app should crash and pod will be restarted anyway.
We should try to avoid a situation where there is no panic but controller for some unknown reason stopped polling/reconciling.
I reviewed as well, then we talked about it with @floreks and @zreigz. It looks good to me, issues with pollers being stuck for any reasons should not happen anymore. One thing that can be added is validation for args to avoid situations like poll interval or jitter being too short.
PollUntilContextCancel
usage in the console controller manager not to rely on our internal method implementation when deciding when to stop polling. Internal method will only return error now that can be logged but the poll function will always returnfalse, nil
(never stop).