linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.6k stars 1.27k forks source link

Multi-cluster demos using TrafficSplit object are not working #12769

Open ValeriiVozniuk opened 3 months ago

ValeriiVozniuk commented 3 months ago

What is the issue?

Deploying multicluster demos from https://linkerd.io/2.15/tasks/multicluster/ and https://linkerd.io/2.15/tasks/automatic-failover/ results in non-working demos.

How can it be reproduced?

  1. Deploy all needed objects per "Multi-cluster communication" guide.
  2. At "Traffic Splitting" step, the TrafficSplit object is successfully created, but traffic is going only to west service/pod.
  3. Same for Failover test, the object is created, but commands shows that traffic staying in west.

Logs, error output, etc

  1. linkerd --context=west -n test viz stat trafficsplit
    Error: error creating metrics request while making stats request: cannot find Kubernetes canonical name from friendly name [trafficsplit]
  2. No traffic in east image
  3. Same for Failover, no object for east in output
    linkerd --context=west viz stat -n emojivoto svc --from deploy/vote-bot
    NAME      MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN
    web-svc        -    80.65%   0.5rps           0ms           0ms           0ms          0
  4. The service mirror is working fine
    k get svc -n emojivoto --context west
    NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
    emoji-svc      ClusterIP   10.43.88.215    <none>        8080/TCP,8801/TCP   66m
    voting-svc     ClusterIP   10.43.151.134   <none>        8080/TCP,8801/TCP   66m
    web-svc        ClusterIP   10.43.143.102   <none>        80/TCP              66m
    web-svc-east   ClusterIP   10.43.36.204    <none>        80/TCP              65m

    and curl requests are successfully routed to east pods via service

output of linkerd check -o short

linkerd check still produces garbled results

 Status check results are √
$ smi extension check
nning smi extension check                               / Running viz extension check
ension check                               - Running viz extension check
         \ Running viz extension check
nning viz extension check                               \ Running viz extension check
ension check                               | Running viz extension check
      / Running viz extension check
ng viz extension check                               / Running viz extension check
Running multicluster extension check                                        - Running multicluster extension check
                                   | Running multicluster extension check
Running multicluster extension check                                        / Running multicluster extension check
  Running multicluster extension check

For smi and multicluster it is a bit better

$ linkerd smi check --context=west
linkerd-smi
-----------
√ linkerd-smi extension Namespace exists
√ SMI extension service account exists
√ SMI extension pods are injected
√ SMI extension pods are running
√ SMI extension proxies are healthy

Status check results are √
$ linkerd multicluster check --context=west
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * east
√ remote cluster access credentials are valid
        * east
√ clusters share trust anchors
        * east
√ service mirror controller has required permissions
        * east
√ service mirror controllers are running
        * east
√ probe services able to communicate with all gateway mirrors
        * east
√ all mirror services have endpoints
√ all mirror services are part of a Link
√ multicluster extension proxies are healthy
√ multicluster extension proxies are up-to-date
√ multicluster extension proxies and cli versions match

Status check results are √

Environment

Kubernetes version: v1.28.10+k3s1 Cluster Environment: oVirt Host OS: Ubuntu 22.04 LTS Linkerd version: edge-24.6.3

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

kflynn commented 2 months ago

@ValeriiVozniuk Hey, sorry for the delay here! I'm going to try to sort this out in the next couple of days... 🤞