istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.74k stars 7.7k forks source link

TLS: Session resumption IDs assigned but not accepted #26009

Closed gertvdijk closed 3 years ago

gertvdijk commented 4 years ago

Bug description

Recently I've upgraded Istio from 1.4.7 to 1.6.6. An SSL Labs server tests now shows a warning on the bottom of the page in the "Protocol Details" section; "Session resumption (caching)": "No (IDs assigned but not accepted)".

Screenshot_20200731_143630

[ ] Configuration Infrastructure [ ] Docs [ ] Installation [ ] Networking [x] Performance and Scalability [ ] Policies and Telemetry [x] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Expected behavior

I'd expect it to be all green on SSL Labs with a modern TLS configuration. Either assign session IDs or not, and if you do, accept them later as session cache.

Steps to reproduce the bug

  1. Deploy Istio with a Gateway that has tls_mode enabled (e.g. SIMPLE, see below for my config with a slightly customized TLS config, but nothing that should break anything, right?).
  2. Run SSL Lab's Server analyzer. https://www.ssllabs.com/ssltest/
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  labels:
    release: istio
  name: ingressgateway-mygateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - mynamespace/sub.domain.tld
    port:
      name: http
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - mynamespace/sub.domain.tld
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      cipherSuites:
      - '[ECDHE-ECDSA-AES128-GCM-SHA256|ECDHE-ECDSA-CHACHA20-POLY1305]'
      - '[ECDHE-RSA-AES128-GCM-SHA256|ECDHE-RSA-CHACHA20-POLY1305]'
      - ECDHE-ECDSA-AES256-GCM-SHA384
      - ECDHE-RSA-AES256-GCM-SHA384
      credentialName: my-credential
      minProtocolVersion: TLSV1_2
      mode: SIMPLE

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

$ istioctl --kubeconfig ~/.kube/config-mycluster version --remote
client version: 1.6.6
control plane version: 1.6.6
data plane version: 1.6.6 (18 proxies)
$ kubectl --kubeconfig ~/.kube/config-mycluster version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:03Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:36:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?

$ istioctl manifest generate -f myprofile.yaml | kubectl --kubeconfig ~/.kube/config-mycluster apply

With myprofile.yaml:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    # Trusted networking cluster; save resources by disabling mTLS.
    enableAutoMtls: false
  values:
    global:
      proxy:
        # Do not auto-inject sidecar, but require:
        # - the namespace of the Deployment/Pod to have a label istio-injection = enabled.
        # - an annotation to be set on the Deployment/Pod (sidecar.istio.io/inject: "true")
        # See for more info: https://istio.io/latest/docs/ops/configuration/mesh/injection-concepts/
        autoInject: disabled
  addonComponents:
    prometheus:
      enabled: false
  components:
    ingressGateways:
      # A default but small scale Ingress gateway, with HTTP & HTTPS on NodePorts 31002 & 31005 respectively.
      - name: istio-ingressgateway
        enabled: true
        k8s:
          hpaSpec:
            minReplicas: 1
            maxReplicas: 2
          strategy:
            rollingUpdate:
              maxSurge: 100%
              maxUnavailable: 50%
          service:
            ports:
              - name: status-port
                port: 15021
                targetPort: 15021
              - name: http2
                port: 80
                targetPort: 8080
                nodePort: 31002
              - name: https
                port: 443
                targetPort: 8443
                nodePort: 31005
              - name: tls
                port: 15443
                targetPort: 15443
      # A second Ingress gateway, [omitted]

Environment where bug was observed (cloud vendor, OS, etc)

kubeadm-installed self-managed cluster.

I'm happy to provide a cluster state archive if needed.

howardjohn commented 4 years ago

It seems likely this is an Envoy level thing, unless we are configuring envoy incorrectly. @PiotrSikora may know more

4c74356b41 commented 4 years ago

I observe the same on 1.5.8

gertvdijk commented 4 years ago

I did some digging into Envoy and I think you're right, @howardjohn, but Istio is configuring Envoy incorrectly too. Let me explain in this a bit extensive comment, but I feel like I've spotted the actual issue.

First of all; hot :potato: here in the Envoy docs on the disable_stateless_session_resumption setting:

If this config is set to false and no keys are explicitly configured, the TLS server will issue TLS session tickets and encrypt/decrypt them using an internally-generated and managed key, with the implication that sessions cannot be resumed across hot restarts or on different hosts.

This means that any setup with replica count > 1 is impacted by this and may or may not show as an issue depending on the chance being directed to the same pod by load balancing.

Envoy changelog mentions the disable_stateless_session_resumption as introduced in 1.14.

tls: added configuration to disable stateless TLS session resumption disable_stateless_session_resumption.

But, more importantly, this auth.TlsSessionTicketKeys section in the API docs on common TLS configuration settings:

If session_ticket_keys is not specified, the TLS library will still support resuming sessions via tickets, but it will use an internally-generated and managed key, so sessions cannot be resumed across hot restarts or on different hosts.

To me, it sounds like a bit of a surprise setting to me, also being a negative formulated one (disable_*) that poses as a misconfiguration with default settings in a fairly typical installation (multiple instances running). :confused: It's definitely a use case of Envoy that Istio should take into account here. :laughing:

This behavior of Envoy already exists since multiple years, with commit https://github.com/envoyproxy/envoy/commit/6f4e692b1e25b3075c10b995fb9026cc858f98ef. I guess I may have been wrong about my statement that it was due to the upgrade of Istio. Sorry about that. :disappointed:

On a quick glance, I guess this disable_stateless_session_resumption setting should just be set to true—if Istio isn't actually managing these session resumption keys for Envoy—and thus correct behavior can be restored with that. I found an issue filed already for handling the TLS session ticket secrets properly (#20347).

I'd be happy to propose a change in a PR to disable it first as a small change towards later re-enabling via #20347, but I'm completely new to the Istio code base and policies and stuff, so that may take quite some time.

howardjohn commented 4 years ago

@gertvdijk thanks for the detailed research!

tmshort commented 4 years ago

However, that only answers why session tickets (large data structures) are being issued, and not useable after a hot-restart. The highlighted error by the originator indicates that session IDs (typically 32-bytes), because of the service load-balancer, the sessionIDs do not work cross-istio-ingressgateway instance.

gertvdijk commented 4 years ago

@tmshort No, it answers both cases, IIUC. Tickets are encrypted with a key that's currently only local to the Envoy instance the client is talking to. The next instance cannot decrypt the ticket. Istio should manage this pool of encryption keys for all instances of ingress gateways and that's open feature #20347 (as mentioned in previous comment). Or did I misunderstand something about your comment?

tmshort commented 4 years ago

I was trying to say that session IDs, which are unique to the host that generated them, should not be expected to always work in a round-robin load-balanced environment. Which is what the title is about. While it is possible to to share resumption data (either ID or ticket), istio is not set up for that. Most TLS implementations will give out a session IDs even if it can't be reused. Turning it off is oftentimes more painful than just leaving it on.

howardjohn commented 3 years ago

@gertvdijk thanks for the investigation. I tried setting disable_stateless_session_resumption=true, does not seem to work. the ssl scanner linked fails with same error, and

$ openssl s_client -connect <APP>:443 -tls1_2 -sess_out /tmp/ssl_s
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-CHACHA20-POLY1305
    Session-ID: 8923BCA60ABA0FD8729F5082AFE6B7312A39EE31E152E16E806AEC72D3D4ABEF
    Session-ID-ctx:
    Master-Key: 5C36DFD4DD43565E40ACDFDBE6F6D6250ED70243A6E4F363C334F04D6A0A204165E6F22B2A7456AC740BD6A08B411742
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1604444281
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: yes
---

Envoy config:

         "transport_socket": {
          "name": "envoy.transport_sockets.tls",
          "typed_config": {
           "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext",
           "common_tls_context": {
            "alpn_protocols": [
             "h2",
             "http/1.1"
            ],
            "tls_certificate_sds_secret_configs": [
             {
              "name": "kubernetes://tls-cert",
              "sds_config": {
               "ads": {},
               "resource_api_version": "V3"
              }
             }
            ]
           },
           "require_client_certificate": false,
           "disable_stateless_session_resumption": true
          }
         }
        }

May need some help from @PiotrSikora

PiotrSikora commented 3 years ago

Envoy doesn't support session resumption via Session ID (stored on the server-side), but Session ID is most likely assigned anyway, since historically some clients would break if there was no Session ID at all.

Envoy only supports session resumption via Session Tickets (stored encrypted on the client-side), and if the encryption key is shared across multiple instances, then sessions can be resumed across all of them, but I don't believe that we support this in Istio.

disable_stateless_session_resumption disables support for Session Tickets, but it doesn't affect Session ID.

PiotrSikora commented 3 years ago

Envoy doesn't support session resumption via Session ID (stored on the server-side), but Session ID is most likely assigned anyway, since historically some clients would break if there was no Session ID at all.

Sorry, this part is wrong. Envoy uses a small built-in Session ID cache (20,480 entries).

In any case, if we want to share sessions across multiple instances, then we need to generate, distribute and rotate Session Ticket Keys via SDS.

istio-policy-bot commented 3 years ago

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2020-11-12. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.