knative-extensions / net-contour

A Knative ingress controller for Project Contour
Apache License 2.0
31 stars 52 forks source link

Enabling Internal Encryption breaks DomainMappings when using Contour #862

Open KauzClay opened 1 year ago

KauzClay commented 1 year ago

What version of Knative?

Relocating from https://github.com/knative/serving/issues/13659 since this is just a Contour issue.

Working off main branch of knative-serving, net-contour

Discovered while trying to add internal encryption e2e tests for net-contour here: https://github.com/knative/serving/pull/13536

Expected Behavior

When I create a DomainMapping for my Knative Service when Internal Encryption is enabled, I am able to reach the KService successfully.

Actual Behavior

DomainMappings fail to become ready, get stuck in "EndpointsNotReady"

net-contour controller says:

{"severity":"ERROR","timestamp":"2023-01-26T21:30:40.304820765Z","logger":"net-contour-controller","caller":"status/status.go:404","message":"Probing of http://hello.gen-14.hello.clay.tanzu.biz.default.net-contour.invalid failed, IP: 10.24.2.35:8080, ready: false, error: unexpected status code: want 200, got 503 (depth: 0)","commit":"e458d29-dirty","knative.dev/controller":"knative.dev.net-contour.pkg.reconciler.contour.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"724fb06d-90db-49cf-917e-a664dd798cb8","knative.dev/key":"default/hello.clay.tanzu.biz--ep","stacktrace":"knative.dev/networking/pkg/status.(*Prober).processWorkItem\n\tknative.dev/networking@v0.0.0-20221202133217-891aac251fc2/pkg/status/status.go:404\nknative.dev/networking/pkg/status.(*Prober).Start.func1\n\tknative.dev/networking@v0.0.0-20221202133217-891aac251fc2/pkg/status/status.go:289"}

I also see this in envoy logs:

[2023-01-26 21:34:40.776][19][debug][router] [source/common/router/router.cc:1212] [C49017][S13478037937466377759] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2023-01-26 21:34:40.776][19][debug][http] [source/common/http/filter_manager.cc:905] [C49017][S13478037937466377759] Sending local reply with details upstream_reset_before_response_started{connection_failure,TLS_error:_268435703:SSL_routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2023-01-26 21:34:40.776][19][debug][http] [source/common/http/conn_manager_impl.cc:1551] [C49017][S13478037937466377759] encoding headers via codec (end_stream=false):
':status', '503'
'content-type', 'text/plain'
'content-encoding', 'gzip'
'vary', 'Accept-Encoding'
'date', 'Thu, 26 Jan 2023 21:34:40 GMT'
'server', 'envoy'
[2023-01-26 21:34:40.735][21][debug][conn_handler] [source/server/active_tcp_listener.cc:147] [C49205] new connection from 10.24.2.43:43224
[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:306] [C49205] new stream
[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:930] [C49205][S163296292697216300] request headers complete (end_stream=true):
':authority', 'hello.gen-3.hello.claysreallyverylongtestineee50218ef4390e47e8e913ebbbebaf8.default.net-contour.invalid'
':path', '/healthz'
':method', 'GET'
'user-agent', 'Knative-Ingress-Probe'
'k-network-hash', 'override'
'k-network-probe', 'probe'
'accept-encoding', 'gzip'
...

[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:913] [C49205][S163296292697216300] request end stream
[2023-01-26 21:34:40.735][21][debug][connection] [./source/common/network/connection_impl.h:92] [C49205] current connecting state: false
[2023-01-26 21:34:40.735][21][debug][router] [source/common/router/router.cc:470] [C49205][S163296292697216300] cluster 'default/hello/80/a67dfba3e6' match for URL '/healthz'
[2023-01-26 21:34:40.735][21][debug][router] [source/common/router/router.cc:678] [C49205][S163296292697216300] router decoding headers:
':authority', 'hello.default.svc.cluster.local'
':path', '/healthz'
':method', 'GET'
':scheme', 'http'
'user-agent', 'Knative-Ingress-Probe'
'k-network-probe', 'probe'
'accept-encoding', 'gzip'
'x-forwarded-for', '10.24.2.43'
'x-forwarded-proto', 'http'
'x-envoy-internal', 'true'
'x-request-id', '9d510376-f3c0-4a55-962d-a1f5a9f0ebe4'
'k-network-hash', 'dc12e833d98a355da2775ad80b3ae02658ed076ec3da0d5670b05f377f36f39e'
'x-request-start', 't=1674768880.735'
...
[2023-01-26 21:34:40.736][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2023-01-26 21:34:40.736][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
[2023-01-26 21:34:40.758][21][debug][router] [source/common/router/router.cc:1796] [C49205][S163296292697216300] performing retry
...
[2023-01-26 21:34:40.759][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
...
[2023-01-26 21:34:40.796][21][debug][router] [source/common/router/router.cc:1796] [C49205][S163296292697216300] performing retry
...
[2023-01-26 21:34:40.798][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
...
[2023-01-26 21:34:40.798][21][debug][http] [source/common/http/filter_manager.cc:905] [C49205][S163296292697216300] Sending local reply with details upstream_reset_before_response_started{connection_failure,TLS_error:_268435703:SSL_routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2023-01-26 21:34:40.798][21][debug][http] [source/common/http/conn_manager_impl.cc:1551] [C49205][S163296292697216300] encoding headers via codec (end_stream=false):
':status', '503'
'content-type', 'text/plain'
'content-encoding', 'gzip'
'vary', 'Accept-Encoding'
'date', 'Thu, 26 Jan 2023 21:34:40 GMT'
'server', 'envoy'

When I try this out with AutoTLS enabled, the domainmappings become ready, but I still get the error when I try hitting the endpoint.

upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER

Steps to Reproduce the Problem

  1. enable internal encryption in config-network
  2. deploy a simple hello world knative service
  3. set up a clusterdomainclaim for your new domain
  4. create a domainmapping

Analysis

I think the problem is in part due to fact that DomainMappings point you back at the envoy. If you look at the DAG, you can see all the routes point to a service on port 443. However, the one for hello goes to 80:

(DAG output comes from the Contour controller, see here) contour-dag-encryption

That service spec looks like this:

apiVersion: v1
kind: Service
metadata:
  ...
  name: hello
  namespace: default
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 80
  sessionAffinity: None
  type: ClusterIP

Internal encryption was implemented so that ports named http2 are h2c when internal encryption is disabled, h2 when enabled.

This means that the HTTPProxy defines hitting the hello service on port 80 with h2 protocol.

So when Envoy tries to make the call, it uses https (for h2), but hits the http listener on Envoy.

When you put autotls on, there is at least a listener for 443 now, but it doesn't have the route data to deal with the request (since svc.cluster.local domains don't get TLS).

Suggestion

I think one way around this is to use the internal encryption secrets for the ClusterLocal visibility domains when internal encryption is enabled. That way you get a listener on 443 for those domains. Then you'd need to change the svc to also use 443. The trouble with that is that it is kinda venturing towards TLS for ClusterLocal routes, which is probably a big undertaking.

I suppose another, more simple option is to make the calls from the envoy back to itself not use encryption. But to me, that seems like leaving a hole in the internal encryption path.

KauzClay commented 1 year ago

Okay #860 didn't totally fix this, going to reopen while I address that

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

KauzClay commented 1 year ago

I think the work to add https to clusterLocal routes here (https://github.com/knative-sandbox/net-certmanager/pull/538) might help in this scenario.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

KauzClay commented 1 year ago

there is an effort to rework some of the internal encryption changes for Knative (see https://github.com/orgs/knative/projects/63/views/1)

As we progress with that, I plan to revisit net-contour. I will try to address this issue then

github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

dprotaso commented 10 months ago

/lifecycle frozen