linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.72k stars 1.28k forks source link

Opaqueness not applied to off-cluster destination with enable-external-profiles annotation #10354

Open dkulchinsky opened 1 year ago

dkulchinsky commented 1 year ago

What is the issue?

We're running Linekrd stable-2.12.2

Linkerd is configured with:

proxy.opaquePorts: 25,587,3306,4444,5432,6379,26379,9300,11211

We set config.linkerd.io/enable-external-profiles: "true" annotation on application Pods that connect to a MySQL server off-cluster on port 3306 (following the instructions from https://linkerd.io/2.12/features/protocol-detection/#setting-the-enable-external-profiles-annotation)

However, the application is failing to connect to the MySQL server and we see the following errors in linkerd proxy logs:

[    12.990661s]  INFO ThreadId(01) outbound:proxy{addr=10.14.0.218:3306}: linkerd_detect: Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s

the address 10.14.0.218 is outside the cluster networks ranges (defined as: clusterNetworks: 172.20.0.0/17,172.20.128.0/17)

Here's the manifest metadata of the running Pod:

kind: Pod
metadata:
  annotations:
    checksum/configmap-key-config.properties: ec936facad2bfc7bf8863ae2b8d3f90356bdfc94e2940ed31654f43abb2b0efb
    cni.projectcalico.org/containerID: 590f016aabbac75a6825ad52e018ea71e4e3d09d341d8b232d6a17cf200e7eca
    cni.projectcalico.org/podIP: 172.20.11.247/32
    cni.projectcalico.org/podIPs: 172.20.11.247/32
    config.linkerd.io/enable-external-profiles: "true"
    linkerd.io/created-by: linkerd/proxy-injector stable-2.12.2
    linkerd.io/inject: enabled
    linkerd.io/proxy-version: stable-2.12.2
    linkerd.io/trust-root-sha256: 1d57b9c015280710eafad0935ee3ec0bc4d7eb430908e89ae20c5ab7e5ec9f80
    vault.security.banzaicloud.io/vault-addr: https://vault.vault.svc:8200
    vault.security.banzaicloud.io/vault-env-daemon: "false"
    vault.security.banzaicloud.io/vault-role: k8s-eventbus-maxwell
    viz.linkerd.io/tap-enabled: "true"

I was reviewing a related issue https://github.com/linkerd/linkerd2/issues/8273, which seem to suggest that this was fixed by https://github.com/linkerd/linkerd2-proxy/pull/1617 and from what I can tell should be included in stable-2.12.2, unfortunately we are not able to get this to work as expected.

For now we're using config.linkerd.io/skip-outbound-ports: "3306" as a workaround, but we are hoping to not need this and use the external profiles method instead.

How can it be reproduced?

  1. Deploy Linkerd stable-2.12.2
  2. Run an application Pod with config.linkerd.io/enable-external-profiles: "true" annotation connecting to a MySQL server on port 3306 running off-cluster (not in the clusterNetworks range(s))
  3. Observe as applications fails to connect and linkerd-proxy reports protocol detection timed out after 10s

Logs, error output, etc

[    12.990661s]  INFO ThreadId(01) outbound:proxy{addr=10.14.0.218:3306}: linkerd_detect: Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s

output of linkerd check -o short

Linkerd core checks
===================

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.12.2 but the latest stable version is 2.12.4
    see https://linkerd.io/2.12/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.12.2 but the latest stable version is 2.12.4
    see https://linkerd.io/2.12/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
    * linkerd-destination-5cc958f64c-jjbhq (stable-2.12.2)
    * linkerd-destination-5cc958f64c-lj8ss (stable-2.12.2)
    * linkerd-destination-5cc958f64c-rjmlq (stable-2.12.2)
    * linkerd-identity-84f9d7cf87-6jtxc (stable-2.12.2)
    * linkerd-identity-84f9d7cf87-g5ndc (stable-2.12.2)
    * linkerd-identity-84f9d7cf87-phbjm (stable-2.12.2)
    * linkerd-proxy-injector-5cd47b84fd-dxpkg (stable-2.12.2)
    * linkerd-proxy-injector-5cd47b84fd-phwcv (stable-2.12.2)
    * linkerd-proxy-injector-5cd47b84fd-zkbq2 (stable-2.12.2)
    see https://linkerd.io/2.12/checks/#l5d-cp-proxy-version for hints

Linkerd extensions checks
=========================

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
    * metrics-api-855d59f76c-68nz9 (stable-2.12.2)
    * prometheus-f7c9f5f74-88djq (stable-2.12.2)
    * tap-74db455fc9-p4gvh (stable-2.12.2)
    * tap-74db455fc9-qvqxt (stable-2.12.2)
    * tap-74db455fc9-v92b4 (stable-2.12.2)
    * tap-injector-5875b778dc-hfmcx (stable-2.12.2)
    * web-576647df96-mnvh6 (stable-2.12.2)
    see https://linkerd.io/2.12/checks/#l5d-viz-proxy-cp-version for hints

Status check results are √

Environment

Possible solution

as a workaround, we are currently using the config.linkerd.io/skip-outbound-ports annotation to skip port 3306 on Pods that need to connect to MySQL database off-cluster

Additional context

Opaqueness for port 3306 works just fine for MySQL database running in-cluster, so this is only affecting connections to MySQL servers running off-cluster.

Would you like to work on fixing this bug?

None

dkulchinsky commented 1 year ago

Hey folks 👋🏼

I saw this was labelled for 2.13, but just wanted to know if you think this is an issue in stable-2.12? or possibly something we have misconfigured?

jeremychase commented 1 year ago

@dkulchinsky We suspect this is a problem with stable-2.12 but need to spend more time debugging before we know for certain.

dkulchinsky commented 1 year ago

Thanks @jeremychase 👍🏼 let me know if you need additional information from me.

dkulchinsky commented 1 year ago

Hey @jeremychase, @risingspiral 👋🏼

Just saw Linkerd 2.13.0 was released, congrats! 🥳

Wanted to check in to see if this issue is something already covered/fixed in 2.13? or would that be in a future path release?

olix0r commented 1 year ago

@dkulchinsky It will be in the future path. In 2.13 we've begun to change the discovery system away from ServiceProfiles. I think we're unlikely to invest more in "external service profiles", but we're still keenly interested in solving the underlying problem of being able to disable protocol detection for out-of-cluster traffic.

dkulchinsky commented 1 year ago

@dkulchinsky It will be in the future path. In 2.13 we've begun to change the discovery system away from ServiceProfiles. I think we're unlikely to invest more in "external service profiles", but we're still keenly interested in solving the underlying problem of being able to disable protocol detection for out-of-cluster traffic.

Thanks @olix0r, I think decoupling these concerns makes total sense.

Will be watching this space for updates as this is one of those issues that we constantly trip over with our users 😓 I'm guessing there's no ETA you can share at this point?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

dkulchinsky commented 1 year ago

still an issue AFAIK, hoping there's some news about this? @olix0r

chris-ng-scmp commented 1 year ago

Have the same issue in the latest 2.14.0

still can see the protocol detection for one of the opaquePorts

I have also tried to set with skipSubnets (--subnets-to-ignore), but protocol detection still running for the request...

 linkerd-proxy {"timestamp":"[   632.121291s]","level":"INFO","fields":{"message":"Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s"},"target":"linkerd_detect","spans":[{"name":"outbound"},{"addr":"xxxxx:3306","name":"proxy"}],"threadId":"ThreadId(1)"}

Only config.linkerd.io/skip-outbound-ports will work

kflynn commented 1 year ago

For the record, we hear y'all on this one: being able to do egress traffic without protocol detection delays would be a good thing.

We want to separate the solution of that problem from the mechanism of ServiceProfiles, though, especially as we've been moving more toward Gateway API. Any thoughts on what kind of mechanisms would fit your use cases particularly well?