jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.56k stars 799 forks source link

Using the Ingress resource with `proxy.service.disableHttpPort` doesn't make sense - but should (?) #2652

Open geoffo-dev opened 2 years ago

geoffo-dev commented 2 years ago

Hello,

I have been trying to use the ingress configuration in the helm chart rather than using the loadbalancer service. I would like to use secure https throughout, but appear to be blocked here:

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/17034aa67e6bee013211265217527151576ee7bf/jupyterhub/templates/ingress.yaml#L26

It looks like http is hard coded and when the option to disable this in the service is removed, the ingress controller cannot point to any port...

Would it be possible to make this a flag so it can be changed to https? I have made the change manually in my config - but it would be good to be supported formally.

Thanks!

geoffo-dev commented 2 years ago

Opened PR https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2653 to address the issue

consideRatio commented 2 years ago

I think this touches on a specific configuration when you have something in proxy.https together with proxy.service.disableHttpPort and ingress.enabled=true. Can you exemplify the configuration you are using and motivate what you wish to accomplish a bit?

I'm cautious to add behavior to the flag proxy.service.disableHttpPort within the Ingress template as its changing behavior outside the Service resource.

geoffo-dev commented 2 years ago

yeah so basically I am looking to enable full path TLS encryption - this would focus encrypting the traffic between the load balancer and service. The idea of using proxy.service.disableHttpPort as the selector is if you are disabling http traffic, the only route is https - so this seemed to be a 'nice' way of doing it.

The configuration I am using is as follows:

  ingress:
    enabled: true
    annotations:
      external-dns.alpha.kubernetes.io/hostname: 'jupyterhub.example.com'
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/scheme: internal
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
      alb.ingress.kubernetes.io/load-balancer-name: 'jupyterhub'
      alb.ingress.kubernetes.io/backend-protocol: HTTPS
      alb.ingress.kubernetes.io/healthcheck-port: '8443'
      alb.ingress.kubernetes.io/ssl-redirect: '443'
      alb.ingress.kubernetes.io/healthcheck-path: /hub/health
    hosts: 
      - 'jupyterhub.example.com'
    pathType: Prefix
    tls:
      - secretName: jupyterhub-tls
        hosts:
          - 'jupyterhub.example.com'

However this configuration fails as whilst the intention is to push traffic to the https port (8443), the ingress controller is set to only push to the http port.

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/a28bb23737f8d66ec3b13db902c262f6b20e4c48/jupyterhub/templates/ingress.yaml#L26

Therefore when alb initialises - it fails the healthcheck as it is trying to send https traffic on a http port.

Essentially this issue was raised as there is no https ingress 'option' on the ingress controller.

consideRatio commented 2 years ago

@geoffo-dev hmmm, I don't clearly understand you still. What is the Helm chart configuration you have setup for proxy.service and proxy.https along with the configuration for ingress you provided above?

Confusion points are:

  1. A k8s Service is just configuration of networking and its translation to something that actually influences networking depends on the k8s implementation. A k8s Service won't be able to respond as a webserver etc. With this in mind, the the traffic between the load balancer and service becomes a vague statement to me, are you saying that the traffic between the LoadBalancer implementation, and the Pod behind the service? This pod would be either the autohttps pod or the proxy pod depending on your configuration.
  2. looking to enable full path TLS encryption is too vague for me to understand as well.

My understanding is that:

  1. TLS termination will be done only once. When its done, it won't be HTTPS encrypted communication any more from that point onwards. If you have configured tls in the ingress, the ingress controller will be responsible for doing it.

    NOTE: While you can have internal TLS encryption between pods using Istio or similar, but that is a far bigger topic and can't be considered as part of this issue.

  2. If you use an Ingress resource, you should not use also a proxy.service.type=LoadBalancer but a ClusterIP type, and use the ingress controller to route decrypted traffic to your k8s pod (as reached by pointing to the k8s Service that is just a pointer to Pods).

Overall, this is tricky to overview and one of the most complicated topics managed by the Helm chart, so unless we can think really clearly about what we are doing, I don't want to make any change.

geoffo-dev commented 2 years ago

Perhaps it is my lack of understanding. My proxy config is:

proxy:
  https: 
    enabled: true
    type: secret
    secret:
      name: jupyterhub-tls
    hosts: 
      - jupyterhub.example.com
proxy: 
    service:
      type: ClusterIP

In essence this is what I am trying to achieve:

https://aws.amazon.com/blogs/containers/setting-up-end-to-end-tls-encryption-on-amazon-eks-with-the-new-aws-load-balancer-controller/

I think from my understanding and from my testing/deployment, if I use the above configuration using ingress with the http service it will not work... If I manually change the config to https it will. I believe this is because under this configuration, it will still enable https on the pods (using the tls certs I get from cert manager), which the (in this case) the AWS Load Balancer is expecting.

When I mean full path TLS - under the the configuration I am trying to achieve I want to do this from the Client to the Pod which does mean TLS is terminated twice - once at the AWS ALB (in this case) and once at the Pod. The reason why I am doing it this way is that the ALB does not support https pass through and I want to encrypt the traffic right up to the pod.

Again apologies if I am not explaining myself correctly - hopefully the AWS Blog will do a better job - but in essence it does work, there just needs to be the option within in the ingress controller to route traffic over the https proxy service.

consideRatio commented 2 years ago

Thank you for iterating with me about this issue @geoffo-dev!


Note that if you provide this YAML, it is very problematic because of proxy is a key defined twice.

proxy:
  https: 
    enabled: true
    type: secret
    secret:
      name: jupyterhub-tls
    hosts: 
      - jupyterhub.example.com
proxy: 
    service:
      type: ClusterIP

Did you mean this? If you meant the other, I think helm silently has ignored the content of the first part declaring proxy.https.

proxy:
  https: 
    enabled: true
    type: secret
    secret:
      name: jupyterhub-tls
    hosts: 
      - jupyterhub.example.com
  service:
    type: ClusterIP

I have a far better understanding of what you wish to accomplish and such now, but its still is too vague to feel comfortable making any change based on it.

Relevant understanding:

  1. A k8s Service of type LoadBalancer will lead to the creation of some networking by the cloud provider, outside the k8s cluster.
  2. A Ingress resource has no meaning by itself, only an ingress controller will give it meaning. What ingress controller is used in your k8s cluster?

The network will go:

  1. To the cloud providers LoadBalancer, as provided automatically by some load-balancer-controller by declaring a k8s Service of type: LoadBalancer
  2. To the k8s ingress-controller which is configured by the Ingress resource to perform TLS termination
  3. To the proxy or autohttps pod of this Helm chart depending on your proxy.https configuration, now without TLS encryption, because nothing has encrypted this again after the decryption by the ingress controller.

I think this topic is a bit too complicated for the both of us right now and I'd like to not increase the complexity by adding a workaround unless its accompanied with a very crisp description of when and why it would make sense to use.

Note that for you to have TLS encrypted traffic also within your k8s cluster, this is just taking care of a small part. You have network traffic between hub/proxy/user pods and optionally between the autohttps pod and proxy pod. With that in mind, I think you should either accept some unencrypted network traffic in the k8s cluster, or work on using something that handles it properly, using for example Istio's mTLS feature to encrypt traffic between pods.


I think at this point I'd like to close and label this issue as wontfix, motivated with it seems like a workaround introducing complexity that only looks to address a small part of a problem which perhaps also may turn out to be specific for one cloud provider.

I'm open to changing my mind about this, but I'm very cautious to making a change to the networking unless very clearly motivated and documented. Networking is the the part of this Helm chart has been the most complicated to maintain I think.

geoffo-dev commented 2 years ago

Good Morning @consideRatio

Apologies for the delayed reply, has been a bit crazy these last few weeks and I wanted to make sure I replied properly!

Sorry - my code was incorrect as I use two values.yaml files (long story) so I copied and pasted. You are correct it should be:

proxy:
  https: 
    enabled: true
    type: secret
    secret:
      name: jupyterhub-tls
    hosts: 
      - jupyterhub.example.com
  service:
    type: ClusterIP

So in the above example I am not creating a LoadBalancer as this would create a Network Load Balancer within AWS using the AWS Load Balancer Controller - rather I use an ALB. There are certain reasons why I dont want to do this, in particular there is no ability to provide redirection on http requests (through annotations).

If I use an ALB, I cannot pass-through SSL termination and this would need to be terminated at the ALB. Therefore I was using the TLS certs within the ingress to terminate traffic between the ingress and the ALB.


With that in mind - I completely appreciate what you say about there being limited value in this approach. We will be using Istio at some stage so that will go someway to doing the full path encryption. However there might be some value in this still as this would provide encryption across 'AWS' - i.e. between their ALB service and the EKS node and the unencrypted elements would be within the node only. Again appreciate if the pods are across nodes, this traffic will still be unencrypted!


I guess the only thing I can say is that if you were to use the option proxy.service.disableHttpPort would stop the ingress controller from working as there is no logic within the ingress controller template to route to the https port.

If you were to include the logic in the PR, then I dont think that would change the functionality (or expectation) in any bad or hacky way, rather it (at least to me) would be expected behaviour and if you disable http, you are doing this to force https. This does work! I just need to change it manually...


Either way - I agree this is very complex (and I am not sure I fully understand the intricacies of K8s networking) and there might be little benefit in this approach.

Thank you for your detailed responses though, maintaining the project and apologies if I have been annoying!

fdrab commented 3 months ago

Stumbled upon this and switched to https from http to make this work. thanks @geoffo-dev for the tip. There are situations where we may need to decrypt / encrypt the traffic several times for security compliance reasons. In my company we use AWS and AWS provided certificates. However, usage of these certificates is only applicable to the AWS load balancer resources, because AWS doesn't provide the private keys to downstream services.

My situation is similar to yours (exact same in fact). Setting a service to LoadBalancer tells the aws lb controller to create a network load balancer, which terminates the TLS there and routes the decrypted traffic to the EKS nodes.

Creating an ingress resource tells the aws lb controller to provision an Application Load Balancer, using AWS certificates and then passes encrypted traffic to the EKS nodes, where the z2hj proxy listens on HTTPS, using self-signed certificates.

This way, the traffic is encrypted from the client to the Load Balancer (outside the cluster using AWS certs) and also from the LB to the EKS nodes.

If the proxy.service.disableHttpPort does not switch the ingress to https, it would be nice if the path mapping was at least configurable, so it'd be part of the helm chart.