target / pod-reaper

Rule based pod killing kubernetes controller
MIT License
200 stars 52 forks source link

added a max-pods option #70

Closed skoef closed 3 years ago

skoef commented 3 years ago

to limit the amount of pods being reaped in a single run

Example log when too many pods were chosen to be reaped:

{"level":"debug","msg":"starting reap cycle","time":"2021-08-24T08:57:33Z"}
{"level":"info","msg":"reaping pod","pod":"my-deploymeny-69bf6ff5f6-nv7gx","reasons":["has been running for 21h24m6.158295489s"],"time":"2021-08-24T08:57:33Z"}
{"level":"info","maxPods":1,"msg":"not reaping pod","pod":"my-deploymeny-69bf6ff5f6-mmvk2","reaped":1,"reason":["has been running for 21h27m43.158365657s"],"time":"2021-08-24T08:57:33Z"}
{"level":"info","maxPods":1,"msg":"not reaping pod","pod":"my-deploymeny-69bf6ff5f6-ns84z","reaped":1,"reason":["has been running for 21h33m49.783874147s"],"time":"2021-08-24T08:57:33Z"}
brianberzins commented 3 years ago

Sorry for slow response, I was off in the wilderness on a hiking vacation!

Quick first thought I had. Right now, the order of pods returned from the API is (probably?) not randomized, so this might favor killing pods in whatever order (alphabetical?) the kubernetes API uses. Is that going to a problem for this use case, or is that totally cool?

skoef commented 3 years ago

Hey @brianberzins, no problem! The randomness doesn't necessarily affect my use case: I'm running pod-reaper with the following (relevant) settings:

This way, pod-reaper makes sure that no pod lives longer than 12h. The randomness would become a problem when the max_duration / schedule would be lower than the amount of pods I have running that match the requirements though. Before I ran with schedule of every 4h and max_duration of 18h, which basically meant that by the time pod-reaper came around it would probably kill a pod that it already killed in one of the previous 3 cycles and leave other pods to live too long.

BTW: these pods come from a statefulset and I didn't really notice a randomness there, perhaps due to the ordinal index of these pods. But this might very well be the case for pods from a deployment indeed.

skoef commented 3 years ago

Thanks @brianberzins !

brianberzins commented 3 years ago

merged and triggered a release (2.11.0) Thank you for the contribution! :-)