pangeo-data / jupyter-earth

Jupyter meets the Earth: combining research use cases in geosciences with technical developments within the Jupyter and Pangeo ecosystems.
https://jupytearth.org
Creative Commons Zero v1.0 Universal
29 stars 6 forks source link

Automatic HTTPS setup for BinderHub #3

Open consideRatio opened 4 years ago

consideRatio commented 4 years ago

Goal

To make it easy to setup BinderHub's Helm chart to easily use HTTPS, which means that network traffic between the user and binderhub will be encrypted.

History

We have used kube-lego to acquire certificates, but it was deprecated and could not comply with a new requirement by Let's Encrypt that we interact with. We have considered kube-lego's successor called cert-manager, but we considered doing so came with a bit too much overhead. This is why we now want a more lightweight solution.

Here is a related issue about not being able to use kube-lego: https://github.com/pangeo-data/pangeo-binder/issues/127

Theory 101

For a BinderHub user to establish a secure communication (HTTPS) with a BinderHub server at binder.example.com, some things need to happen first.

  1. Choose a CA (Let's Encrypt) We need a common trusted party, a Certificate Authority (CA). Let's Encrypt is such CA that is well trusted and free to use.

  2. Prove domain ownership -> acquire signed domain certificate The CA can give away a signed domain certificate acting as proof the domain owner needs later, but only to a domain owner that can prove its ownership of the domain to the CA. This is where the ACME protocol is useful. BinderHub can ask the CA for a http01 challenge to prove it. During the http01 challenge, BinderHub will need to respond in a specific way to binder.example.com, which help the CA be confident it is control of the server responding to binder.example.com.

  3. Encrypt HTTP / Decrypt HTTPS BinderHub needs to encrypt/decrypt all outgoing/incoming traffic. This is called TLS termination and can be done in a standalone manner by a TLS termination proxy or by the webserver serving the BinderHub content. This step requires the certificate we acquired in the previous step.

Additional reading

Z2JH's solution

I want to reuse as much as possible from the Z2JH solution implemented in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/1539 by @yuvipanda. Assuming Z2JH was configured to use this solution, the following would happen.

  1. Z2JH would redirect traffic from its original HTTP only entrypoint to a Traefik (v2) proxy.
  2. A sidecar container running alongside the Traefik container in a pod would mount a certificate from a secret if there was one that Traefik could use.
  3. If no signed domain certificate was available to Treafik, Traefik would acquire a signed domain certificates from Let's Encrypt through a http01 challenge. 1.If a new signed domain certificate was acquired, the sidecar container would store it to a secret.
  4. The Traefik proxy would use the signed domain certificate to terminate TLS and send direct traffic towards the HTTP entrypoint.

Z2JH's Kubernetes Service: proxy-public

BinderHub's planned solution

I plan to either reference or duplicate this code, and make minor changes to help BinderHub specifically.

BinderHub's Kubernetes Service: binder

The Auto HTTPS part is new, the other is left unchanged.

choldgraf commented 4 years ago

cc @betatim who was just on an issue where I mentioned Erik working on security stuff. Tim - this is a "meta" repository for the "Jupyter Meets the Earth" project and some conversation about JupyterHub/Binder-related things might happen here, though in general we will try to keep issues etc in the proper JupyterHub repository. Just wanted to let you know about this repo's existence :-)

yuvipanda commented 4 years ago

This is awesome content, @consideRatio!

I want to suggest an alternate approach. If you see https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/proxy/autohttps/configmap.yaml#L104, we are defining just one router + backend, pointing to CHP. However, it is trivial to add some templating there for extra routers and extra backends. So the z2jh config could take a list of domain names + their backends (as service names or just DNS entries), and add them to the traefik config. This will make traefik aquire HTTPS certs for all of them, and do some routing as well. This should cause less work, and make it really easy for users of z2jh to use for other external services they might have.

What do you think of this?

betatim commented 4 years ago

:wave:

Two points: 1) cert-manager on mybinder.org (finally) just worked (for staging, haven't had time to enable it for prod) 2) I support Chris' suggestion that if you want to discuss architecture/plan changes for binderhub it is probably best to move the conversation there with enough time for the people who hang out there to chime in and give input

A question: is it possible to use the traefik that does LE for JupyterHub to also cover other Ingress objects in the k8s namespace? Thinking of mybinder.org where we have several ingress objects that needs certs but aren't part of the binderhub or z2jh chart. The reason for moving forward with cert-manager there was that I think we need something like it tocover all these "extra" ingress objects.

consideRatio commented 4 years ago

@yuvipanda @betatim @choldgraf this topic was to a large extent a rubber ducking exercise that got out of hand and ended up being technically relevant for repos like z2jh and binderhub. I'll ensure to bring discussion there going onwards.


I want to suggest an alternate approach.

@yuvipanda sounds good to me! I'm going for it!

  1. cert-manager on mybinder.org (finally) just worked (for staging, haven't had time to enable it for prod)

@betatim :tada: yepp! nginx-ingress + cert-manager is the more reliable and scalable solution that for example could avoid causing disruptions thanks to the ingress proxy pods are HA and the acquisition of certificates can work well still. This would be the more lightweight default solution that can help a new binderhub admin quickly get up and running.

A question: is it possible to use the traefik that does LE for JupyterHub to also cover other Ingress objects in the k8s namespace? Thinking of mybinder.org where we have several ingress objects that needs certs but aren't part of the binderhub or z2jh chart. The reason for moving forward with cert-manager there was that I think we need something like it tocover all these "extra" ingress objects.

@betatim yepp I think Traefik can do this, to configure itself to route traffic arriving to it given ingress resources in Kubernetes, and I think it can also acquire certificates for these resources. But, unlike a nginx-ingress + cert-manager setup, Traefik's open source version cannot work in HA and acquire certificates for the ingress routes.

consideRatio commented 4 years ago

Branched out work and discussion

Z2JH PR: Configuration sustainable from a maintainer perspective

Helm charts are hard to maintain because it requires to plan for everything that someone may want to modify, which is impossible. This PR is about setting us up to avoid that in preparation for the coming related PRs.

Z2JH PR: Testing infrastructure for autohttps

This infrastructure doesn't modify our current tests, but it allows us to mimic having Let's Encrypt in our CI system and the process of acquiring HTTPS certificate as well as using them.

Future: Z2JH PR: Make CI system test with HTTPS

Future: BinderHub PR: Use Z2JH's infra to get HTTPS

consideRatio commented 4 years ago

Status update

This is still ongoing. The effort so far...

What remains is to update https://github.com/jupyterhub/binderhub/pull/1101 now that the pieces are in place, which also requires a bit of CI updates of the BinderHub infrastructure as well.

consideRatio commented 4 years ago

Status update

I have not dropped the ball on this, but my attention isn't fully focused on it. Here is the current status of things. Since last status update...

Future

consideRatio commented 3 years ago

Status update

I lost momentum waiting for step-along-the-way PRs to be merged and didn't get back on it. I have https://github.com/jupyterhub/binderhub/pull/1179 open still, but it isn't a road block I think.

This work can continue at this point.