Closed gertvdijk closed 3 years ago
It seems likely this is an Envoy level thing, unless we are configuring envoy incorrectly. @PiotrSikora may know more
I observe the same on 1.5.8
I did some digging into Envoy and I think you're right, @howardjohn, but Istio is configuring Envoy incorrectly too. Let me explain in this a bit extensive comment, but I feel like I've spotted the actual issue.
First of all; hot :potato: here in the Envoy docs on the disable_stateless_session_resumption
setting:
If this config is set to false and no keys are explicitly configured, the TLS server will issue TLS session tickets and encrypt/decrypt them using an internally-generated and managed key, with the implication that sessions cannot be resumed across hot restarts or on different hosts.
This means that any setup with replica count > 1 is impacted by this and may or may not show as an issue depending on the chance being directed to the same pod by load balancing.
Envoy changelog mentions the disable_stateless_session_resumption
as introduced in 1.14.
tls: added configuration to disable stateless TLS session resumption disable_stateless_session_resumption.
But, more importantly, this auth.TlsSessionTicketKeys
section in the API docs on common TLS configuration settings:
If session_ticket_keys is not specified, the TLS library will still support resuming sessions via tickets, but it will use an internally-generated and managed key, so sessions cannot be resumed across hot restarts or on different hosts.
To me, it sounds like a bit of a surprise setting to me, also being a negative formulated one (disable_
*) that poses as a misconfiguration with default settings in a fairly typical installation (multiple instances running). :confused:
It's definitely a use case of Envoy that Istio should take into account here. :laughing:
This behavior of Envoy already exists since multiple years, with commit https://github.com/envoyproxy/envoy/commit/6f4e692b1e25b3075c10b995fb9026cc858f98ef. I guess I may have been wrong about my statement that it was due to the upgrade of Istio. Sorry about that. :disappointed:
On a quick glance, I guess this disable_stateless_session_resumption
setting should just be set to true—if Istio isn't actually managing these session resumption keys for Envoy—and thus correct behavior can be restored with that. I found an issue filed already for handling the TLS session ticket secrets properly (#20347).
I'd be happy to propose a change in a PR to disable it first as a small change towards later re-enabling via #20347, but I'm completely new to the Istio code base and policies and stuff, so that may take quite some time.
@gertvdijk thanks for the detailed research!
However, that only answers why session tickets (large data structures) are being issued, and not useable after a hot-restart. The highlighted error by the originator indicates that session IDs (typically 32-bytes), because of the service load-balancer, the sessionIDs do not work cross-istio-ingressgateway instance.
@tmshort No, it answers both cases, IIUC. Tickets are encrypted with a key that's currently only local to the Envoy instance the client is talking to. The next instance cannot decrypt the ticket. Istio should manage this pool of encryption keys for all instances of ingress gateways and that's open feature #20347 (as mentioned in previous comment). Or did I misunderstand something about your comment?
I was trying to say that session IDs, which are unique to the host that generated them, should not be expected to always work in a round-robin load-balanced environment. Which is what the title is about. While it is possible to to share resumption data (either ID or ticket), istio is not set up for that. Most TLS implementations will give out a session IDs even if it can't be reused. Turning it off is oftentimes more painful than just leaving it on.
@gertvdijk thanks for the investigation. I tried setting disable_stateless_session_resumption=true, does not seem to work. the ssl scanner linked fails with same error, and
$ openssl s_client -connect <APP>:443 -tls1_2 -sess_out /tmp/ssl_s
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-CHACHA20-POLY1305
Session-ID: 8923BCA60ABA0FD8729F5082AFE6B7312A39EE31E152E16E806AEC72D3D4ABEF
Session-ID-ctx:
Master-Key: 5C36DFD4DD43565E40ACDFDBE6F6D6250ED70243A6E4F363C334F04D6A0A204165E6F22B2A7456AC740BD6A08B411742
PSK identity: None
PSK identity hint: None
SRP username: None
Start Time: 1604444281
Timeout : 7200 (sec)
Verify return code: 0 (ok)
Extended master secret: yes
---
Envoy config:
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext",
"common_tls_context": {
"alpn_protocols": [
"h2",
"http/1.1"
],
"tls_certificate_sds_secret_configs": [
{
"name": "kubernetes://tls-cert",
"sds_config": {
"ads": {},
"resource_api_version": "V3"
}
}
]
},
"require_client_certificate": false,
"disable_stateless_session_resumption": true
}
}
}
May need some help from @PiotrSikora
Envoy doesn't support session resumption via Session ID (stored on the server-side), but Session ID is most likely assigned anyway, since historically some clients would break if there was no Session ID at all.
Envoy only supports session resumption via Session Tickets (stored encrypted on the client-side), and if the encryption key is shared across multiple instances, then sessions can be resumed across all of them, but I don't believe that we support this in Istio.
disable_stateless_session_resumption
disables support for Session Tickets, but it doesn't affect Session ID.
Envoy doesn't support session resumption via Session ID (stored on the server-side), but Session ID is most likely assigned anyway, since historically some clients would break if there was no Session ID at all.
Sorry, this part is wrong. Envoy uses a small built-in Session ID cache (20,480 entries).
In any case, if we want to share sessions across multiple instances, then we need to generate, distribute and rotate Session Ticket Keys via SDS.
🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2020-11-12. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.
Created by the issue and PR lifecycle manager.
Bug description
Recently I've upgraded Istio from 1.4.7 to 1.6.6. An SSL Labs server tests now shows a warning on the bottom of the page in the "Protocol Details" section; "Session resumption (caching)": "No (IDs assigned but not accepted)".
[ ] Configuration Infrastructure [ ] Docs [ ] Installation [ ] Networking [x] Performance and Scalability [ ] Policies and Telemetry [x] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
Expected behavior
I'd expect it to be all green on SSL Labs with a modern TLS configuration. Either assign session IDs or not, and if you do, accept them later as session cache.
Steps to reproduce the bug
Gateway
that hastls_mode
enabled (e.g. SIMPLE, see below for my config with a slightly customized TLS config, but nothing that should break anything, right?).Version (include the output of
istioctl version --remote
andkubectl version
andhelm version
if you used Helm)How was Istio installed?
With
myprofile.yaml
:Environment where bug was observed (cloud vendor, OS, etc)
kubeadm-installed self-managed cluster.
I'm happy to provide a cluster state archive if needed.