jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.54k stars 796 forks source link

Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

Open consideRatio opened 5 years ago

consideRatio commented 5 years ago

In kubernetes, there is an analogy that a pod should be treated as cattle as compared to as a pet. The idea is that a pet isn't interchangeable, while cattle is.

Our hub pod communicates with the proxy-api kubernetes service, that will redirect traffic to one of the available proxy pods. But, when our hub pod does so, it actively configures that one pod while it in reality should configure all proxy pods.

Example issue scenario

Assume that there is not only a single proxy pod for some reason during a time interval. It could be that we want to have high availability (HA) and have made two be running at all time, or because we are making a helm chart upgrade that roll out a new proxy pod, or that the proxy pod crashed for some reason and a new started up.

The hub pod will speak with the proxy-api network service that will delegate traffic to one proxy pod, but not all. The hub will say to the proxy pod things like "Hey, when someone requests to access /user/erik they should go to 10.68.3.5!". The hub will also ask "Hey, what routes are you configured with already?", and if the hub concludes a route should be added or removed, it will speak up about that. But but but... The hub doesn't really know who it speaks with, it thinks it speaks with its single pet, but in reality it speaks with its cattle, and it does not try to make sure all cattle behave the same way but instead is focused on a single pet.

Goals

Pain points

Related I think

1226 - I think the proxy pod restarted, and the hub were clueless and didn't update the proxy's state. After this there may have been automatic updates of the proxy pod implemented, so the issue will go away after a while due to this but will still occur briefly.

manics commented 5 years ago

Have you had any thoughts on how to implement this? If it's not a simple change it'd be worth designing in the option for the proxy to talk to multiple hubs even if that's not currently supported.

consideRatio commented 5 years ago

I'm not sure, one option would be to use a k8s configmap to keep the state, but I figure this may be a low-end solution with various downsides. For example, I imagine that the propegation of updates to the configmap to be read by the container in a pod will be slow and things probably would get out of sync easily. I figure the typical high end solution is to use HA redis or similar, but that is probably a bit overkill. GitLab does this. But hmmm, is there a in between option?

Perhaps one step would be to allow for the usage of a existing redis deployment assuming custom configuration is provided. GitLab's helm chart allows you to get a lot of things by default, and then configure the use of external database, external nginx-ingress, external cert-manager, etc.

betatim commented 5 years ago

I think the thing to look at for this is using traefik as replacement for the configurable-http-proxy. @GeorgianaElena has been working on this over the summer. Not sure anyone has tried running the jupyterhub with traefik setup in kubernets yet.

The traefik based proxy is in this repository: https://github.com/jupyterhub/traefik-proxy

manics commented 5 years ago

@minrk has a PR to add Traefik but it needs more work: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/1162

minrk commented 5 years ago

1162 needs to be updated to switch to consul from etcd (traefik is very slow with etcd and lots of routes, which @GeorgianaElena discoverd. But yes, #1162 completely solves this issue.