wish / katalog-sync

A reliable node-local mechanism for syncing k8s pods to consul services
MIT License
37 stars 7 forks source link

High CPU edge case #53

Closed kieran-fan closed 2 years ago

kieran-fan commented 2 years ago

Hey, I'm encountering a high CPU edge case for katalog-sync. By high CPU i mean it uses all available CPU assigned to it in a resource limit.

Scenario:

If MyPod starts with a passing readiness check, then later fails - it does not trigger high CPU on katalog-sync

After some digging, it looks like the daemon method waitPod is being called (https://github.com/wish/katalog-sync/blob/master/pkg/daemon/daemon.go#L329) continuously, expecting the Service to be ready - which in my scenario it is not.

Workaround that we've been using: At L329 - add in an if statement.

                if p.OutstandingReadinessGate {
                    go d.waitPod(p)
                }

The if condition is used a couple of times in a method of the pod struct already (e.g. https://github.com/wish/katalog-sync/blob/master/pkg/daemon/struct.go#L313)

Is this a viable way of solving the high CPU? Also, im wondering what about a back off in the for loop, or a limit to the numbner of times it runs?

Thanks