keylimetoolbox / resque-kubernetes

Run Resque (and ActiveJob) workers as Kubernetes Jobs and autoscale from 0!
MIT License
54 stars 15 forks source link

Clean up finished pods that have successfully completed #21

Closed jeremywadsack closed 5 years ago

jeremywadsack commented 5 years ago

This addresses #20

In production we are spinning up and deleting thousands of new pods a day. Eventually the overhead of carrying these completed pods is leading to timeouts connecting to the API server (specifically DNS timeouts). If we can keep the pods cleaned up that keeps the system more responsive.

In #4 we removed the original code to reap completed pods because it was removing pods that were killed for exceeding memory. This adjust the original code so that it only reaps pods with all containers reporting a "Completed" status. This will retain any pods that list "OOMKilled" as their status for inspection later.