kmesh-net / kmesh

High Performance ServiceMesh Data Plane Based on Programmable Kernel
https://kmesh.net
Apache License 2.0
464 stars 70 forks source link

Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked #679

Closed tacslon closed 3 months ago

tacslon commented 3 months ago

What type of PR is this?

/kind bug What this PR does / why we need it: Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked Which issue(s) this PR fixes: Fixes #591 #665

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE
tacslon commented 3 months ago

The following code can compare if XDP program has any difference with the previous one, but if we always use the previous XDP program, duplicate maps do exist. So for now, always update XDP programs linked at pod's NIC driver.

    // If kmesh restarts, prevProgInfo is not nil
    prevProg, prevProgInfo, _ := utils.GetProgAndInfoByName(constants.XDP_PROG_NAME)
    log.Infof("prevProgInfo: %+v", prevProgInfo)
    progTag, _ := spec.Programs[constants.XDP_PROG_NAME].Tag()
    log.Infof("progTag: %v", progTag)
    if prevProgInfo != nil && prevProgInfo.Tag != "" && prevProgInfo.Tag == progTag {
        // Same XDP prog, just assign, do not load new prog, but use new maps to prevent duplicate maps
        xa.XdpShutdown = prevProg
        if xa.MapOfAuth, err = utils.GetMapByName(constants.XDP_MAP_NAME); err != nil {
            return nil, err
        }
        log.Infof("MapOfAuth: %+v", xa.MapOfAuth)
    } else {
        // New prog, do load and assignment
        if err = spec.LoadAndAssign(&xa.KmeshXDPAuthObjects, &opts); err != nil {
            return nil, err
        }
    }

For example, map id 3595,3583,3584,3588,3602 are all previous XDP program's maps, and they are duplicate.

1778: xdp  name xdp_shutdown  tag a9eca76e9384b59d  gpl
        loaded_at 2024-08-06T21:36:43+0800  uid 0
        xlated 1728B  jited 984B  memlock 4096B  map_ids 3595,3583,3584,3588,3602
        btf_id 2248
hzxuzhonghu commented 3 months ago

but if we always use the previous XDP program, duplicate maps do exist

Where do we create the duplicate maps again?

tacslon commented 3 months ago

but if we always use the previous XDP program, duplicate maps do exist

Where do we create the duplicate maps again?

Other eBPF programs that have reference to same maps, e.g. sockops has a reference to bpf_log_level, and old XDP program has an old bpf_log_level map

hzxuzhonghu commented 3 months ago

but if we always use the previous XDP program, duplicate maps do exist

Where do we create the duplicate maps again?

Other eBPF programs that have reference to same maps, e.g. sockops has a reference to bpf_log_level, and old XDP program has an old bpf_log_level map

I am not sure i understand what you mean. Do you mean sockops refer a bpf_log_level map, but xdp refer another? How could that happen if we detach and unlink the old xdp prog correctly

hzxuzhonghu commented 3 months ago

You should respect the restart feature, do re attach/link as other bpf prog

hzxuzhonghu commented 3 months ago

lgtm otherwise

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 67.18750% with 21 lines in your changes missing coverage. Please review.

Project coverage is 51.39%. Comparing base (433592b) to head (aee3fee). Report is 1 commits behind head on main.

Files Patch % Lines
pkg/controller/manage/manage_controller.go 68.25% 13 Missing and 7 partials :warning:
pkg/controller/controller.go 0.00% 1 Missing :warning:
Files Coverage Δ
pkg/controller/controller.go 0.00% <0.00%> (ø)
pkg/controller/manage/manage_controller.go 55.26% <68.25%> (-0.18%) :arrow_down:

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dda7049...aee3fee. Read the comment docs.

hzxuzhonghu commented 3 months ago

@YaoZengzeng TestRemoveAddNsOrServiceWaypoint failed

LiZhenCheng9527 commented 3 months ago

@YaoZengzeng TestRemoveAddNsOrServiceWaypoint failed

The PR I merged in yesterday also failed, but rerunning the job passed it.

YaoZengzeng commented 3 months ago

@YaoZengzeng TestRemoveAddNsOrServiceWaypoint failed

I'll take a closer look

YaoZengzeng commented 3 months ago

It seems like that the Kmesh daemon crashed during the test, casuing the newly deployed waypoint fail to take effect.

hzxuzhonghu commented 3 months ago

Ut coverage does not satisfy the checker

kmesh-bot commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[pkg/controller/OWNERS](https://github.com/kmesh-net/kmesh/blob/main/pkg/controller/OWNERS)~~ [hzxuzhonghu] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
kmesh-bot commented 3 months ago

In response to a cherrypick label: #679 failed to apply on top of branch "release-0.4":

Applying: Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked
Using index info to reconstruct a base tree...
M   pkg/controller/controller.go
A   pkg/controller/manage/manage_controller.go
A   pkg/controller/manage/manage_controller_test.go
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): pkg/controller/manage/manage_controller_test.go deleted in HEAD and modified in Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked. Version Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked of pkg/controller/manage/manage_controller_test.go left in tree.
CONFLICT (modify/delete): pkg/controller/manage/manage_controller.go deleted in HEAD and modified in Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked. Version Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked of pkg/controller/manage/manage_controller.go left in tree.
Auto-merging pkg/controller/controller.go
CONFLICT (content): Merge conflict in pkg/controller/controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
kmesh-bot commented 3 months ago

In response to a cherrypick label: new issue created for failed cherrypick: #716