knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.53k stars 1.15k forks source link

Unable to use GCP HTTPS load balancer with istio/knative with full e2e encryption #12376

Closed adriangudas closed 2 years ago

adriangudas commented 2 years ago

Ask your question here:

Knative serving version: 1.0.0 Istio version: 1.10.2 K8s version: v1.20.10-gke.1600

Hello,

We're currently using a L4 TCP load balancer to ingress traffic into our knative-serving istio-ingressgateway. We wanted to move to using a GCP L7 HTTPS load balancer to take advantage of some features that the L7 load balancing provides, such as GCP's network endpoint groups for more fault-tolerant load balancing, the ability to retain the original IP address of incoming requests, and to potentially simplify routing by removing kube-proxy as a potential hop in the routing path (to summarize a few of the goals, briefly).

The desired setup, as a quick diagram:

GCP L7 load balancer (external IP) ----> TLS -----> istio-ingressgateway:443

We kept running into issues where as soon as we moved to using port 443 on the istio-ingressgateway as the backend for the GCP LB, we ran into 502's. Port 80 would work fine. The error when using port 443 would manifest as a 502 Server Error to the client, and the GCP logs contained an entry that backend_connection_closed_before_data_sent_to_client. On the istio ingress side, the message filter_chain_not_found was printed by the istio-ingressgateway pods.

We finally tracked it down to being an issue with SNI, it seems. This thread in the Istio repo summarizes the problem (unless I'm missing something). We are using LetsEncrypt (certmanager knative integration) to generate per-namespace wildcard certificates (in addition to a few statically defined non-wildcard certificates), and, as described in the above thread, istio indeed serves multiple certificates on one IP; all Gateway objects created by Knative indeed match by hosts: in the spec. If my understanding is correct: the way Knative uses multiple gateways with hosts: -- and more generally, the fact that we're trying to host multiple certificates behind the same IP and serve traffic on multiple domains, and not using a single SAN certificate -- forces SNI to be enabled, which precludes the use of the GCP L7 load balancer to encrypt traffic to istio with Knative. So the problem actually isn't specifically with Knative, but how it expects you to be using multiple certificates behind a single IP address.

I'm just wondering if this makes sense, or if we're the only ones who have run into this issue - if there's a workaround or another way to fix this (e.g. by somehow using a SAN certificate that doesn't require SNI), or if I'm missing something and I'm completely wrong here, I'd be very curious to know. We've reverted to using a TCP L4 load balancer in the meantime, which works but doesn't provide some of the advantages I've listed above. Also not entirely sure if a SAN certificate would work for us either, given that wildcard domains in a SAN aren't supported (as I understand it), and we're currently using the wildcard certs to serve multiple applications in the namespace using the same certificate.

Thanks!

Edit (Fri Dec 3, 2021): seems like SNI to backend is something Google should support on their LB's - going to see if I can follow up with someone on their support team, to try to clarify if this is something they plan to support, or not...

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.