IngressRoute provider proxy

jcrist commented 4 years ago

I'm working on a new proxy implementation for JupyterHub (and dask-gateway) that uses Traefik 2.0's IngressRoute provider for managing the routing table. This allows storing all routing table state as CRD objects in kubernetes, so there's no need to manage an external etcd/consul/etc... Another benefit of the CRD objects is that it allows exposing non-http services (e.g. dask clusters), since it doesn't have to conform to the ingress api.

I'm wondering where to put this. I see 3 options:

This library
kubespawner, since they already have a proxy implementation in there and this could only be used in kubernetes.
Some new library

Happy to go either way, just want a 2nd opinion before I start integrating it with the codebase and CI.

consideRatio commented 4 years ago

Wohoooo i love to see the use of a small CRD to determine routing for traefik, ive hoped for that solution before as it can piggyback on a HA key/value store already there in the k8s cluster. I'm mainly working on Z2JH, and this would be only of relevance in k8s though.

I need to learn and think more about this, but right now i dont think kubespawner should be modified or used for this. I think it makes sense to make a jupyterhub proxy class like this library contains several of, for a trafik 2.0 deployment configured by CRDs, which a k8s based jupyterhub could interact with.

@GeorgianaElena / @minrk / @yuvipanda - what do you think?

jcrist commented 4 years ago

but right now i dont think kubespawner should be modified or used for this.

Sorry, to be clear I think the implementation will be the same regardless of location (a new proxy class). I only suggested putting it in the kubespawner library since it's kubernetes only, and kubespawner already contains one proxy implementation (KubeIngressProxy). We'd also need to create a service for each pod, which could be done in the proxy or as part of the spawner.

GeorgianaElena commented 4 years ago

This is awesome news @jcrist :heart:

I know Traefik 2.0 adds some breaking changes so I don't think this new proxy can subclass TraefikProxy like all the other proxies here. Even so, I believe it would be nice to have all the JupyterHub Traefik proxies in the same place. When Traefik 2.0 will support more providers we could potentially have them all use traefik 2.0 (I believe this is already possible for TraefikTomlProxy) and build a common base class.

So I'm +1 on having the new proxy here even if it's kubernetes only :+1:

consideRatio commented 4 years ago

We'd also need to create a service for each pod, which could be done in the proxy or as part of the spawner.

Is this a general need associated with using a treafik 2.0 configured by CRDs by a new class together with a z2jh deployment, or need for your use case?

Id like to better understand that need no matter what, could you elaborate @jcrist ?

If we want that, and have CRDs already representing routing, id btw consider the creation of a dedicated controller that acts based on those CRDs and matching related services instead of integrating that into logic in this proxy class. Im not sure at all what is the most sensible approach, but that is one option. What seem to make the most sense is probably depending on the details etc.

@jcrist, if you would you like to write some specifications of your goals and implementation strategy to reach them, id be a very excited to read and discuss it!

jcrist commented 4 years ago

Is this a general need associated with using a treafik 2.0 configured by CRDs by a new class together with a z2jh deployment, or need for your use case?

For traefik to add a route based on CRDs it needs 2 things:

A CRD describing the route (an IngressRoute or IngressRouteTCP object)
A Service object, referenced in the CRD for determining the set of backend addresses for that route.

A service can be created without mapping to a deployment (you can manually specify cluster_ip in the V1ServiceSpec (see https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1ServiceSpec.md). As such, adding a route would create a service for the given address, then attach that service to an IngressRoute object. Deleting a route would delete both objects. This isn't terribly tricky, both specs are fairly small and self contained to just the address being proxied and the route path, it just takes two objects (a service and an IngressRoute) to define a new route.

See https://docs.traefik.io/routing/providers/kubernetes-crd/ for more information, the docs are fairly thorough.

id btw consider the creation of a controller that acts based on those CRDs instead of integrating that into logic in this proxy class.

I think it'd be cleaner to handle everything in the proxy class, adding a controller seems like an unnecessary extra component IMO.

consideRatio commented 4 years ago

Ah! I see yes if it was generally needed for all use of traefik 2.0 with CRD configuration, as it seems to me from that description you made, then putting the service creation logic separetly in a controller seems pointless.

Relevant considerations:

Services can link to pods with matching labels
ownerReferences on a service to the route CRD resource could make the service be garbage collected on deletion of the CRD.
making a service specify a specific cluster_ip is quite brittle, a pod restart and you have a new one for example. Perhaps it would be useful if k8s could add a label like hub.jupyter.org/username=some-username by default on pods to help out.

jcrist commented 4 years ago

Services can link to pods with matching labels

Since by the time the proxy gets asked to add a route we already know the address, and jupyterhub doesn't let kubernetes handle restarts, directly specifying the cluster_ip on the service seems cleaner than having kubernetes find it for itself.

ownerReferences on a service to the route CRD resource could make the service be garbage collected on deletion of the CRD.

Sidetrack - I've thought about using ownerReferences in the past for other use problems, but haven't seen it used much in the wild. I wasn't sure if relying on kubernetes GC is an antipattern or not (see https://twitter.com/jiminy_crist/status/1139288332448489472). Do you have thoughts on this? Here I think we'd manually delete both, but I'm curious about use cases I have elsewhere.

consideRatio commented 4 years ago

I saw the ownerreference / garbage collection solution be used recently by Argo Workflow resources, their workflow resource controller create a set of pods for a workflow. Delete the workflow, delete the pods. I have not heard the pattern being discussed though, in my mind it is currently a nice thing.

yuvipanda commented 4 years ago

This would be awesome :)

Some notes from the implementation in kubespawner:

It scales pretty well
I don't think it should be in kubespawner, it should be separate
The eventual consistency will always get you, so gotta be very careful! It is hard to tell when a proxy route has been 'added', and JupyterHub will sometimes freak out if you (proxy implementation) say that a route has been added but it has not.
I vastly prefer creating our own Endpoint objects associated with a service, rather than using label selectors on the pods. This is how services work - they use the label selector to create Endpoint objects, each of which can point to an IP. Since each route in our case will always point to a single IP (in reference to a single pod), creating our own Endpoint objects is cleaner, easier to reason about and faster. This is what the implementation in kubespawner does.
ownerReferences are awesome! They are actually used a lot in kubernetes internally - almost every pod you create has an ownerReference. Using that here seems great. I actually opened an issue to add those to dask-kubernetes (https://github.com/dask/dask-kubernetes/issues/210)

Hope this is all useful information! Looking forward to this :)

jcrist commented 4 years ago

Update: I'm still interested in this, but priorities have been shifted and work on this has been put on hold (😞 ). If anyone else is interested in taking this on please feel free - if I pick back up on this work I'll post here again.

jpweng commented 10 months ago

Is there any update about this, as it may be very useful for us. Maybe I can take over, is there already a git repo?

jupyterhub / traefik-proxy

IngressRoute provider proxy #85