kmesh-net / kmesh

High Performance ServiceMesh Data Plane Based on Programmable Kernel
https://kmesh.net
Apache License 2.0
424 stars 59 forks source link

Unexpected Bpf tracelog #355

Closed hzxuzhonghu closed 2 months ago

hzxuzhonghu commented 4 months ago

What happened:

Running L7 waypoint example https://kmesh.net/en/docs/userguide/try_waypoint/.

I have seen some errors in bpf log. not sure why it failed. But I can see the waypoint access log via enabling accesslog.

[2024-05-23T02:20:26.624Z] "GET /reviews/0 HTTP/1.1" 200 - via_upstream - "-" 0 358 3 3 "-" "curl/8.7.1" "6a4eac59-a99e-4a3b-af12-f28589732903" "reviews:9080" "10.244.0.100:9080" inbound-vip|9080|http/v1|reviews.default.svc.cluster.local 10.244.0.109:37840 10.96.165.244:9080 envoy://internal_client_address/ - -
[2024-05-23T02:20:27.740Z] "GET /reviews/0 HTTP/1.1" 200 - via_upstream - "-" 0 358 2 2 "-" "curl/8.7.1" "dd57416d-a285-490b-b451-8a70609ba482" "reviews:9080" "10.244.0.100:9080" inbound-vip|9080|http/v1|reviews.default.svc.cluster.local 10.244.0.109:37840 10.96.165.244:9080 envoy://internal_client_address/ - -

           <...>-3051027 [000] d...1 2317308.111889: bpf_trace_printk: [KMESH] DEBUG: origin addr=[153706506:30755]

           <...>-3051027 [000] d...1 2317308.111897: bpf_trace_printk: [KMESH] DEBUG: bpf find frontend addr=[153706506:30755]

           <...>-3051034 [000] d...1 2317308.115448: bpf_trace_printk: [KMESH] DEBUG: origin addr=[1310744586:30755]

           <...>-3051034 [000] d...1 2317308.115455: bpf_trace_printk: [KMESH] DEBUG: bpf find frontend addr=[1310744586:30755]

          python-3051034 [000] d...1 2317308.119152: bpf_trace_printk: [KMESH] DEBUG: origin addr=[4104478730:30755]

          python-3051034 [000] d...1 2317308.119163: bpf_trace_printk: [KMESH] DEBUG: bpf find frontend addr=[4104478730:30755]

          python-3051034 [000] d...1 2317308.119165: bpf_trace_printk: [KMESH] DEBUG: origin addr=[1856135178:41018]

          python-3051034 [000] d...1 2317308.119166: bpf_trace_printk: [KMESH] DEBUG: bpf find frontend addr=[1856135178:41018]

          python-3051034 [000] d...1 2317308.119167: bpf_trace_printk: [ENDPOINT] ERR: backend_manager failed, ret:1

          python-3051034 [000] d...1 2317308.119168: bpf_trace_printk: [SERVICE] ERR: endpoint_manager failed, ret:1

          python-3051034 [000] d...1 2317308.119168: bpf_trace_printk: [FRONTEND] ERR: service_manager failed, ret:1

          python-3051034 [000] d...1 2317308.119169: bpf_trace_printk: [KMESH] ERR: frontend_manager failed, ret:1

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

hzxuzhonghu commented 4 months ago

Firgurred out the cause is backend_manager has a tail call which returns 1, we should check return value < 0

nlgwcy commented 4 months ago

Firgurred out the cause is backend_manager has a tail call which returns 1, we should check return value < 0

A tail call is a form mechanism that allows eBPF authors to break up their logic into multiple parts and go from one to the other. Unlike traditional function calls, control flow never returns to the code making a tail call. Maybe need to be confirmed again.

hzxuzhonghu commented 4 months ago

I am not sure how can backend_manager return 1, above is my guess though, i do not find any official docs

static inline int backend_manager(ctx_buff_t *ctx, backend_value *backend_v)
{
    int ret;
    address_t target_addr;
    __u32 *sk = (__u32 *)ctx->sk;
    struct bpf_sock_tuple value_tuple = {0};

    if (backend_v->waypoint_addr != 0 && backend_v->waypoint_port != 0) {
        BPF_LOG(DEBUG, BACKEND, "find waypoint addr=[%u:%u]\n", backend_v->waypoint_addr, backend_v->waypoint_port);
        value_tuple.ipv4.daddr = ctx->user_ip4;
        value_tuple.ipv4.dport = ctx->user_port;

        ret = bpf_map_update_elem(&map_of_dst_info, &sk, &value_tuple, BPF_NOEXIST);
        if (ret) {
            BPF_LOG(ERR, BACKEND, "record metadata origin address and port failed, ret is %d\n", ret);
            return ret;
        }
        target_addr.ipv4 = backend_v->waypoint_addr;
        target_addr.port = backend_v->waypoint_port;
        SET_CTX_ADDRESS(ctx, target_addr);
        kmesh_workload_tail_call(ctx, TAIL_CALL_CONNECT4_INDEX);

        // if tail call failed will run this code
        BPF_LOG(ERR, BACKEND, "workload tail call failed, err is %d\n", ret);
        return -ENOEXEC;
    }

#pragma unroll
    for (unsigned int i = 0; i < backend_v->port_count; i++) {
        if (i >= MAX_PORT_COUNT) {
            BPF_LOG(WARN, BACKEND, "exceed the max port count\n");
            return -EINVAL;
        }

        if (ctx->user_port == backend_v->service_port[i]) {
            target_addr.ipv4 = backend_v->ipv4;
            target_addr.port = backend_v->target_port[i];
            SET_CTX_ADDRESS(ctx, target_addr);
            BPF_LOG(DEBUG, BACKEND, "get the backend addr=[%u:%u]\n", target_addr.ipv4, target_addr.port);
            return 0;
        }
    }

    BPF_LOG(ERR, BACKEND, "failed to get the backend\n");
    return -ENOENT;
}
hzxuzhonghu commented 4 months ago

I cannot reproduce this with the latest main branch