loft-sh / loft

Namespace & Virtual Cluster Manager for Kubernetes - Lightweight Virtual Clusters, Self-Service Provisioning for Engineers and 70% Cost Savings with Sleep Mode
https://loft.sh/docs/introduction
Other
738 stars 65 forks source link

Loft quota validation webhook can deadlock cluster with Cilium networking #239

Closed withinboredom closed 1 year ago

withinboredom commented 1 year ago

Steps to reproduce:

  1. Have DNS shutdown the same time as networking pods (such as Cilium).
  2. With DNS down, when the networking pod comes back up, networking pod cannot find Loft
  3. Because networking is down, DNS cannot come back up.

I notice the quota endpoint is validating '*', including CRDs such as the ones Cilium uses to configure itself -- especially in cluster-pool IPAM.

Workaround:

Delete the validation hooks CRD and turn off Loft until recovery.

Possible fix?

Have Loft only validate things it has settings for.

FabianKramm commented 1 year ago

@withinboredom thanks for creating this issue! Thats weird, the loft-agent webhook should only apply to namespaces with the label loft.sh/owned=true. Is it maybe possible that this label is present on your networking namespace?

withinboredom commented 1 year ago

According to my notes, it was failing to update endpoint CRDs in namespaces which were loft projects. I have node-local DNS services, which is stored in a loft project.