traefik proxy take 2 - Githubissues

Proposed change

I'd like to take another stab at switching to traefik for the proxy implementation, but I have a different plan from #1162, which I hope is more likely to succeed. Specifically:

replace only chp with traefik, leave the autohttps in place (multi-replica traefik can't do ACME for some silly reasons)
add it as an option, rather than make it the only option (lower barrier to success as deployments can pick CHP if there's a problem). For example, anonymous BinderHub deployments (i.e. mybinder.org) cannot use the traefik proxy implementation until https://github.com/jupyterhub/traefik-proxy/issues/151 is addressed or another approach to activity monitoring is used for the culler.
select implementation on proxy.kind. We could have a proxy.kind = 'external' to disable the proxy deployment entirely, for #3481
deploy valkey-server as the storage backend (open source fork of redis)

The first frustrating hurdle (because it's such a minor thing) is that there's already a .Values.proxy.traefik config, which is the obvious place to put traefik proxy config. But that's currently not traefik proxy config, it's specifically autohttps pod config. What should config look like when there's a 'traefik' alternative to CHP and a traefik autohttps pod? My first inclination is to move the current proxy.traefik config to proxy.autohttps.traefik and use proxy.traefik for the peer to proxy.chp.

Alternatives:

rename proxy.chp to something generic like proxy.pod and use the same config for both, because the only thing chp-specific that I can see in .Values.proxy.chp is the default value for image. Upside: simpler, more precisely descriptive; almost no proxy.chp config is implementation-specific. Downside: unnecessary change in config for folks still using chp, can't have different config (if anyone would want any) in place at the same time for easier switching of implementation, if there is any implementation-specific config, it's less clear where it should go (maybe keep proxy.chp/traefik for that?).

Advantages of traefik proxy:

scalable, highly available, faster throughput (multiple replicas means proxy can be upgraded/restarted with less disruption, throughput can be scaled on demand by increasing replica count)
actively maintained and widely used, unlike node-http-proxy, which underpins CHP
routing table can be persisted, so routes are not lost when proxy restarts

Disadvantages:

slower to change routes (in benchmarks, CHP is far faster than traefik for adding/removing routes, especially when many transactions are concurrent (many users starting/stopping at once).
No proxy activity metrics (https://github.com/jupyterhub/traefik-proxy/issues/151)

Alternative options

Keep CHP, add redis persistence which also gives us multiple replicas
unconditional switch (less complexity in chart at the expense of needing to make sure all use cases are met - this is what killed #1162, I think)
merge autohttps traefik and proxy traefik (tempting because they are both traefik, but autohttps doesn't work with multiple replicas). This can still be considered later in its own PR.

Who would use this feature?

All deployments, but especially those with lots of users who don't need network-activity-based culling (jupyterhub-singleuser's internal activity tracking should work for the vast majority of deployments).

Related issues:

3481
1951
1496
1364
1162

jupyterhub / zero-to-jupyterhub-k8s

traefik proxy take 2 #3497

Proposed change

Alternative options

Who would use this feature?

3481

1951

1496

1364

1162