kmesh-net / kmesh

High Performance ServiceMesh Data Plane Based on Programmable Kernel
https://kmesh.net
Apache License 2.0
424 stars 59 forks source link

failed to access when deploying waypoints of various granulairties in a mixed manner #691

Closed YaoZengzeng closed 1 month ago

YaoZengzeng commented 1 month ago

What happened:

We have just fixed the inaccessibility issuse when deploy waypoint in ns granularity #628

But I include both ns granularity and svc granularity waypoint in the same ns, such as:

root@kurator-linux-0002:~# kgp
NAME                                    READY   STATUS    RESTARTS   AGE
details-v1-cdd874bc9-fb6j6              1/1     Running   0          4h55m
productpage-v1-5bb9985d4d-9rxpd         1/1     Running   0          4h55m
ratings-v1-6484d64bbc-vfcpg             1/1     Running   0          4h55m
reviews-svc-waypoint-6884756fc5-mplbb   1/1     Running   0          11m
reviews-v1-598f9b58fc-rb5d5             1/1     Running   0          4h55m
reviews-v2-5979c6fc9c-7c4wd             1/1     Running   0          4h55m
reviews-v3-7bbb5b9cf7-jnvlk             1/1     Running   0          4h55m
sleep-5577c64d7c-tz6zr                  1/1     Running   0          4h55m
tcp-echo-svc-waypoint-7578fb9d5-zs9t5   1/1     Running   0          4h55m
tcp-echo-v1-55dbb9bcb-5ndph             1/1     Running   0          4h55m
tcp-echo-v2-77c855674c-wpzxp            1/1     Running   0          4h55m
waypoint-7cd5f44d8f-zjsd8               1/1     Running   0          4h55m
root@kurator-linux-0002:~# kubectl  get gateway
NAME                    CLASS            ADDRESS        PROGRAMMED   AGE
reviews-svc-waypoint    istio-waypoint   10.96.48.220   True         12m
tcp-echo-svc-waypoint   istio-waypoint   10.96.1.132    True         24h
waypoint                istio-waypoint   10.96.62.79    True         5h3m

We deployed the ns granularity waypoint called waypoint and also deployed waypoint for reviews svc called reviews-svc-waypoint.

Ref to https://kmesh.net/en/docs/userguide/try_waypoint/ for how to deploy waypoint.

Access bookinfo, the result is as follow:

root@kurator-linux-0002:~# kubectl exec deploy/sleep -- sh -c "curl -s http://productpage:9080/productpage | grep reviews-v.-" 
command terminated with exit code 1

The access log of ns waypoint:

[2024-08-06T07:07:05.632Z] "GET /details/0 HTTP/1.1" 200 - via_upstream - "-" 0 178 1 1 "-" "curl/8.9.1" "36e99792-52f6-43be-96f7-6673879a85fe" "details:9080" "10.244.0.93:9080" inbound-vip|9080|http|details.default.svc.cluster.local 10.244.0.103:45656 10.96.254.42:9080 envoy://internal_client_address/ - default
[2024-08-06T07:07:05.628Z] "GET /productpage HTTP/1.1" 200 - via_upstream - "-" 0 3889 15 14 "-" "curl/8.9.1" "36e99792-52f6-43be-96f7-6673879a85fe" "productpage:9080" "10.244.0.92:9080" inbound-vip|9080|http|productpage.default.svc.cluster.local 10.244.0.103:54996 10.96.10.221:9080 envoy://internal_client_address/ - default

The waypoint of reviews svc doesn't print any access log.

What you expected to happen:

Regardless of how waypoints are deployed, we must be able to access normally

How to reproduce it (as minimally and precisely as possible):

As above.

Anything else we need to know?:

Environment:

hzxuzhonghu commented 1 month ago

please paste the bpf trace log.

You can use kmesh-daemon log --set bpf:debug within kmesh pod to turn on debug

YaoZengzeng commented 1 month ago

The deployed application status is as follow:

root@kurator-linux-0002:~# kgp -owide 
NAME                                    READY   STATUS    RESTARTS   AGE   IP            NODE                   NOMINATED NODE   READINESS GATES
details-v1-cdd874bc9-25gjt              1/1     Running   0          28m   10.244.1.22   kmesh-testing-worker   <none>           <none>
httpbin-648f469544-5jkgs                1/1     Running   0          54m   10.244.1.14   kmesh-testing-worker   <none>           <none>
httpbin-svc-waypoint-5b777b8859-b9hpp   1/1     Running   0          37m   10.244.1.19   kmesh-testing-worker   <none>           <none>
productpage-v1-5bb9985d4d-dqpbc         1/1     Running   0          28m   10.244.1.27   kmesh-testing-worker   <none>           <none>
ratings-v1-6484d64bbc-7fw8t             1/1     Running   0          28m   10.244.1.23   kmesh-testing-worker   <none>           <none>
reviews-svc-waypoint-6884756fc5-mc8bp   1/1     Running   0          28m   10.244.1.29   kmesh-testing-worker   <none>           <none>
reviews-v1-598f9b58fc-k56kx             1/1     Running   0          28m   10.244.1.24   kmesh-testing-worker   <none>           <none>
reviews-v2-5979c6fc9c-59pmv             1/1     Running   0          28m   10.244.1.25   kmesh-testing-worker   <none>           <none>
reviews-v3-7bbb5b9cf7-dbd6k             1/1     Running   0          28m   10.244.1.26   kmesh-testing-worker   <none>           <none>
sleep-5577c64d7c-mzctk                  1/1     Running   0          54m   10.244.1.15   kmesh-testing-worker   <none>           <none>
waypoint-b7bc55b9f-c2hsg                1/1     Running   0          15m   10.244.1.31   kmesh-testing-worker   <none>           <none>
root@kurator-linux-0002:~# kgs 
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)               AGE
details                ClusterIP   10.96.32.204    <none>        9080/TCP              28m
httpbin                ClusterIP   10.96.195.202   <none>        8000/TCP              55m
httpbin-svc-waypoint   ClusterIP   10.96.202.255   <none>        15021/TCP,15008/TCP   37m
kubernetes             ClusterIP   10.96.0.1       <none>        443/TCP               67m
productpage            ClusterIP   10.96.227.210   <none>        9080/TCP              28m
ratings                ClusterIP   10.96.76.254    <none>        9080/TCP              28m
reviews                ClusterIP   10.96.61.192    <none>        9080/TCP              28m
reviews-svc-waypoint   ClusterIP   10.96.108.93    <none>        15021/TCP,15008/TCP   28m
sleep                  ClusterIP   10.96.11.177    <none>        80/TCP                55m
waypoint               ClusterIP   10.96.95.140    <none>        15021/TCP,15008/TCP   15m

Actuall the curl command was executed successfully but because reviews were not accessed, the subsequent grep failed:

kubectl exec deploy/sleep -n "$NAMESPACE" -- sh -c "curl -s http://productpage:9080/productpage | grep reviews-v.-"

The Kmesh daemon log after executing the above command is as follows:

time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.227.210:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.227.210:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] DEBUG: access the backend by service:2936233694\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] DEBUG: get the backend addr=[10.244.1.31:15019]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.244.1.31:15019]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.244.1.31:15019]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SOCKOPS] ERR: enable encoding metadata failed!, err is -16" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SENDMSG] DEBUG: get valid dst, do encoding...\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.32.204:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.32.204:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] DEBUG: access the backend by service:2936233694\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] DEBUG: get the backend addr=[10.244.1.31:15019]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.244.1.31:15019]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.244.1.31:15019]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SOCKOPS] ERR: enable encoding metadata failed!, err is -16" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SENDMSG] DEBUG: get valid dst, do encoding...\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.61.192:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.61.192:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: record metadata origin address and port failed, ret is 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: waypoint_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[FRONTEND] ERR: service_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: frontend_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: sock_traffic_control failed: 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: record metadata origin address and port failed, ret is 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: waypoint_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[FRONTEND] ERR: service_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: frontend_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: sock_traffic_control failed: 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: connect bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: connect bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SOCKOPS] ERR: enable encoding metadata failed!, err is -16" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SENDMSG] DEBUG: get valid dst, do encoding...\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: close bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: close bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.61.192:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.61.192:9080]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: record metadata origin address and port failed, ret is 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: waypoint_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[FRONTEND] ERR: service_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: frontend_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: sock_traffic_control failed: 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: origin addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] DEBUG: bpf find frontend addr=[2:10.96.108.93:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SERVICE] DEBUG: find waypoint addr=[10.96.95.140:15008]\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: record metadata origin address and port failed, ret is 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[BACKEND] ERR: waypoint_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[FRONTEND] ERR: service_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: frontend_manager failed, ret:0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[KMESH] ERR: sock_traffic_control failed: 0\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: connect bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: connect bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SOCKOPS] ERR: enable encoding metadata failed!, err is -16" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[SENDMSG] DEBUG: get valid dst, do encoding...\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: close bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="[PROBE] ERR: close bpf_sk_storage_get failed\n" subsys=ebpf
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="can't find service correspond workload: waypoint-b7bc55b9f-c2hsg" subsys=pkg/telemetry
time="2024-08-14T07:41:51Z" level=info msg="get destination service host failed" subsys=pkg/telemetry
YaoZengzeng commented 1 month ago

The config dump is as follows, only show waypoint related service:

...        {
            "name": "httpbin-svc-waypoint",
            "namespace": "default",
            "hostname": "httpbin-svc-waypoint.default.svc.cluster.local",
            "vips": [
                "/10.96.202.255"
            ],
            "ports": [
                {
                    "service_port": 15021,
                    "target_port": 15021
                },
                {
                    "service_port": 15008,
                    "target_port": 15008
                }
            ],
            "loadBalancer": {
                "mode": "FAILOVER",
                "routingPreferences": [
                    "NETWORK",
                    "REGION",
                    "ZONE",
                    "SUBZONE"
                ]
            },
            "waypoint": {
                "destination": "/10.96.95.140"
            }
        },
...
        {
            "name": "reviews-svc-waypoint",
            "namespace": "default",
            "hostname": "reviews-svc-waypoint.default.svc.cluster.local",
            "vips": [
                "/10.96.108.93"
            ],
            "ports": [
                {
                    "service_port": 15021,
                    "target_port": 15021
                },
                {
                    "service_port": 15008,
                    "target_port": 15008
                }
            ],
            "loadBalancer": {
                "mode": "FAILOVER",
                "routingPreferences": [
                    "NETWORK",
                    "REGION",
                    "ZONE",
                    "SUBZONE"
                ]
            },
            "waypoint": {
                "destination": "/10.96.95.140"
            }
        },
...
        {
            "name": "waypoint",
            "namespace": "default",
            "hostname": "waypoint.default.svc.cluster.local",
            "vips": [
                "/10.96.95.140"
            ],
            "ports": [
                {
                    "service_port": 15021,
                    "target_port": 15021
                },
                {
                    "service_port": 15008,
                    "target_port": 15008
                }
            ],
            "loadBalancer": {
                "mode": "FAILOVER",
                "routingPreferences": [
                    "NETWORK",
                    "REGION",
                    "ZONE",
                    "SUBZONE"
                ]
            },
            "waypoint": {
                "destination": ""
            }
        },