linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.57k stars 1.27k forks source link

Traffic split not working with ExternalName services / services in different namespaces #12830

Closed jackpf closed 2 weeks ago

jackpf commented 1 month ago

What is the issue?

Hi, I'm trying to get linkerd to perform traffic splitting between services running in different namespaces.

I'm attempting to use ExternalName services as bridges between the namespaces, but this seems to be where things stop working. If I run everything in the same namespace, the traffic split works correctly.

Is this a bug, or maybe this isn't the correct way to configure splits between namespaces? I couldn't find any relevant info in the docs.

How can it be reproduced?

Here is a sample config of what I'm trying to achieve - running app-1 and app-2 in separate namespaces. Then having the traffic split and ExternalName services as bridges to app-1 and app-2 running in the linkerd-diff-ns namespace.

apiVersion: v1
kind: Namespace
metadata:
  name: linkerd-diff-ns
  labels:
    name: linkerd-diff-ns
  annotations:
    linkerd.io/inject: enabled

---

# App 1
apiVersion: v1
kind: Namespace
metadata:
  name: app-1-ns
  labels:
    name: app-1-ns
  annotations:
    linkerd.io/inject: enabled
---
apiVersion: v1
kind: Pod
metadata:
  namespace: app-1-ns
  name: app-1
  labels:
    app.kubernetes.io/name: app-1
spec:
  containers:
    - name: app-1
      image: httpd
      ports:
        - containerPort: 80
          name: http-web-svc
      lifecycle:
        postStart:
          exec:
            command: [ "/bin/sh", "-c", "echo 'I am app-1' > /usr/local/apache2/htdocs/index.html" ]
---
apiVersion: v1
kind: Service
metadata:
  namespace: app-1-ns
  name: app-1-service
spec:
  selector:
    app.kubernetes.io/name: app-1
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80

---

# App 2
apiVersion: v1
kind: Namespace
metadata:
  name: app-2-ns
  labels:
    name: app-2-ns
  annotations:
    linkerd.io/inject: enabled
---
apiVersion: v1
kind: Pod
metadata:
  namespace: app-2-ns
  name: app-2
  labels:
    app.kubernetes.io/name: app-2
spec:
  containers:
    - name: app-2
      image: httpd
      ports:
        - containerPort: 80
          name: http-web-svc
      lifecycle:
        postStart:
          exec:
            command: [ "/bin/sh", "-c", "echo 'I am app-2' > /usr/local/apache2/htdocs/index.html" ]
---
apiVersion: v1
kind: Service
metadata:
  namespace: app-2-ns
  name: app-2-service
spec:
  selector:
    app.kubernetes.io/name: app-2
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80

---

# Service bridges

apiVersion: v1
kind: Service
metadata:
  name: app-1-bridge
  namespace: linkerd-diff-ns
spec:
  type: ExternalName
  externalName: app-1-service.app-1-ns.svc.cluster.local
---
apiVersion: v1
kind: Service
metadata:
  name: app-2-bridge
  namespace: linkerd-diff-ns
spec:
  type: ExternalName
  externalName: app-2-service.app-1-ns.svc.cluster.local

---

# Traffic split

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: app-traffic-split
  namespace: linkerd-diff-ns
spec:
  service: traffic-split-service
  backends:
    - service: app-1-bridge
      weight: 50
    - service: app-2-bridge
      weight: 50
---
apiVersion: v1
kind: Service
metadata:
  name: traffic-split-service
  namespace: linkerd-diff-ns
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      name: external

Logs, error output, etc

Everything seems to work apart from the traffic split.

curl app-1-service.app-1-ns.svc.cluster.local # works
curl curl app-1-bridge.linkerd-diff-ns.svc.cluster.local # works
curl app-2-service.app-2-ns.svc.cluster.local # works
curl app-2-bridge.linkerd-diff-ns.svc.cluster.local # works

curl traffic-split-service.linkerd-diff-ns.svc.cluster.local # doesn't work - gives 500 response

The only error I've been able to find from the linkerd container is:

[   213.889793s]  INFO ThreadId(01) outbound:proxy{addr=172.20.103.9:80}:rescue{client.addr=10.108.74.96:39540}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=logical service traffic-split-service.linkerd-diff-ns.svc.cluster.local:80: Service.linkerd-diff-ns.app-2-bridge:80: pool failed: status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} } error.sources=[Service.linkerd-diff-ns.app-2-bridge:80: pool failed: status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} }, pool failed: status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} }, status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} }]
[   213.889826s]  WARN ThreadId(01) outbound:proxy{addr=172.20.103.9:80}:rescue{client.addr=10.108.74.96:39540}: linkerd_app_outbound::http::server: Unexpected error error=logical service traffic-split-service.linkerd-diff-ns.svc.cluster.local:80: Service.linkerd-diff-ns.app-2-bridge:80: pool failed: status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} } error.sources=[Service.linkerd-diff-ns.app-2-bridge:80: pool failed: status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} }, pool failed: status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} }, status: InvalidArgument, message: "Invalid authority: app-2-bridge.linkerd-diff-ns.svc.cluster.local:80", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 11 Jul 2024 10:56:09 GMT"} }]

output of linkerd check -o short

Status check results are √

Environment

Kubernetes:

Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b

Cluster environment: EKS

Host OS: Amazon Linux 2 - Linux 5.10.219-208.866.amzn2.aarch64 #1 SMP Tue Jun 18 14:00:02 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Linkerd version:

Client version: edge-24.7.1
Server version: edge-24.7.1

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

kflynn commented 1 month ago

Hey @jackpf, the simplest answer here is that in edge-24.7.1 you should probably just use HTTPRoutes for this, rather than TrafficSplits.

You'll need to make sure that you have the Gateway API CRDs installed (which will possibly require you to have installed Linkerd with --set enableHttpRoutes=false at the moment -- if you didn't do that, let me know):

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/experimental-install.yaml

We'll still use your traffic-split-service:

---
apiVersion: v1
kind: Service
metadata:
  name: traffic-split-service
  namespace: linkerd-diff-ns
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      name: external

After that, we'll create two Gateway API ReferenceGrants to tell Gateway API that it's OK to route traffic from linkerd-diff-ns to app-1-ns and app-2-ns:

---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: app-1-ns-grant
  namespace: app-1-ns
spec:
  from:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    namespace: linkerd-diff-ns
  to:
  - group: ""
    kind: Service
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: app-2-ns-grant
  namespace: app-2-ns
spec:
  from:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    namespace: linkerd-diff-ns
  to:
  - group: ""
    kind: Service

and then finally, we can create an HTTPRoute that intercepts traffic to traffic-split-service and splits it 50/50 between app-1-service and app-2-service:

---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: app-traffic-split
  namespace: linkerd-diff-ns
spec:
  parentRefs:
    - name: traffic-split-service
      kind: Service
      group: core
      port: 80
  rules:
    - backendRefs:
        - name: app-1-service
          namespace: app-1-ns
          port: 80
          weight: 50
        - name: app-2-service
          namespace: app-2-ns
          port: 80
          weight: 50

I think this should do what you want.

kflynn commented 2 weeks ago

I'm going to go ahead and close this one – feel free to reopen if you need to!