Closed kieran-fan closed 2 years ago
Hey, I'm encountering a high CPU edge case for katalog-sync. By high CPU i mean it uses all available CPU assigned to it in a resource limit.
Scenario:
If MyPod starts with a passing readiness check, then later fails - it does not trigger high CPU on katalog-sync
After some digging, it looks like the daemon method waitPod is being called (https://github.com/wish/katalog-sync/blob/master/pkg/daemon/daemon.go#L329) continuously, expecting the Service to be ready - which in my scenario it is not.
Workaround that we've been using: At L329 - add in an if statement.
if p.OutstandingReadinessGate { go d.waitPod(p) }
The if condition is used a couple of times in a method of the pod struct already (e.g. https://github.com/wish/katalog-sync/blob/master/pkg/daemon/struct.go#L313)
Is this a viable way of solving the high CPU? Also, im wondering what about a back off in the for loop, or a limit to the numbner of times it runs?
Thanks
Hey, I'm encountering a high CPU edge case for katalog-sync. By high CPU i mean it uses all available CPU assigned to it in a resource limit.
Scenario:
If MyPod starts with a passing readiness check, then later fails - it does not trigger high CPU on katalog-sync
After some digging, it looks like the daemon method waitPod is being called (https://github.com/wish/katalog-sync/blob/master/pkg/daemon/daemon.go#L329) continuously, expecting the Service to be ready - which in my scenario it is not.
Workaround that we've been using: At L329 - add in an if statement.
The if condition is used a couple of times in a method of the pod struct already (e.g. https://github.com/wish/katalog-sync/blob/master/pkg/daemon/struct.go#L313)
Is this a viable way of solving the high CPU? Also, im wondering what about a back off in the for loop, or a limit to the numbner of times it runs?
Thanks