hjacobs / kubernetes-failure-stories

Compilation of public failure/horror stories related to Kubernetes
https://k8s.af
6.23k stars 309 forks source link

Add dex/CR/bad defaults failure story #40

Closed pieterlange closed 5 years ago

pieterlange commented 5 years ago

Slides for my failure story related to the default dex configuration storing authrequests as CustomResources and its potential for nuking your kubernetes control plane.

The link: https://pieterlange.github.io/failure-stories/2019-06.dex.html Ref: https://github.com/dexidp/dex/issues/1292 Shared at: https://www.meetup.com/Dutch-Kubernetes-Meetup/events/262313920/

zerkms commented 5 years ago
Spoiler warning > NO BUSINESS APPLICATIONS WERE HARMED DURING THIS OUTAGE.

^ that's truly impressive!

pieterlange commented 5 years ago

It was merely a (very scary!) control plane outage. The monitoring systems were running during the outage but inaccessible since dex (the auth system) was down so i did actually have the data to prove that the apps were up, after the storm was over.

The biggest challenge in recovering from this failure was doing so without access to the monitoring systems (at some point i actually did make a ssh portforward directly to the machines running the grafana/kibana pods). Fun times.

hjacobs commented 5 years ago

Thanks!

hjacobs commented 5 years ago

@pieterlange can you sort it at the right place (newest on top)?

githubrotem commented 4 years ago

Is there any real solution for this issue? Right now anyone can make a curl loop and bring the cluster down

szuecs commented 4 years ago

We use skipper as kube-apiserver sidecar to do auth. and we can easily add client rate limits: https://opensource.zalando.com/skipper/reference/filters/#clientratelimit

Auth is done by tokens and validation is done by a tokeninfo sidecar that is fast enough.

To protect your dex endpoint you can use either skipper in front of that, too, or bind it on localhost in the apiserver pod and use skipper to integrate with it. The localhost example might need some special routes, but this can be achieved.

hjacobs commented 4 years ago

@githubrotem see also https://twitter.com/pst418/status/1216739457400999938