Closed maryamtahhan closed 11 months ago
Tested with a kind cluster make run-on-kind
configure the unprivileged_bpf_disabled
kernel flag on the kind worker nodes
$ docker exec af-xdp-deployment-worker sysctl kernel.unprivileged_bpf_disabled=0
kernel.unprivileged_bpf_disabled = 0
$ docker exec af-xdp-deployment-worker2 sysctl kernel.unprivileged_bpf_disabled=0
Used the following NAD:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: afxdp-network
annotations:
k8s.v1.cni.cncf.io/resourceName: afxdp/myPool
spec:
config: '{
"cniVersion": "0.3.0",
"type": "afxdp",
"mode": "primary",
"logFile": "afxdp-cni.log",
"logLevel": "debug",
"dpSyncer": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.1.0/24",
"rangeStart": "192.168.1.200",
"rangeEnd": "192.168.1.220",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"gateway": "192.168.1.1"
}
}'
and the following pod spec:
apiVersion: v1
kind: Pod
metadata:
name: cndp-0-0
annotations:
k8s.v1.cni.cncf.io/networks: afxdp-network
spec:
containers:
- name: cndp-0
command: ["/bin/bash"]
args: ["-c", "./jsonc_gen.sh -kp ; cndpfwd -c config.jsonc lb;"]
image: quay.io/mtahhan/cndp-map-pinning:latest
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_RAW
- IPC_LOCK
resources:
requests:
afxdp/myPool: '1'
limits:
afxdp/myPool: '1'
Also need to load the container image to the kind workers:
$ kind load --name af-xdp-deployment docker-image quay.io/mtahhan/cndp-map-pinning:latest
Then creating and deleting the cndp pod - the logs of the Device plugin are updated accordingly with bpf map pinning messages/information
the cndp pod log itself should show:
**** PINNED_BPF_MAP is enabled
libbpf: can't get next link: Operation not permitted
*** CNDPFWD Forward Application, API: XSKDEV, Mode: Loopback, Burst Size: 256
Initial Thread ID 1 on lcore 1
Forwarding Thread ID 28 on lcore 0
DP logs on pod creation:
DEBU[2023-07-18 12:23:24] [poolManager.go:220] [Allocate] Primary mode
DEBU[2023-07-18 12:23:24] [poolManager.go:232] [Allocate] Cycling state of device veth12
INFO[2023-07-18 12:23:24] [poolManager.go:250] [Allocate] Loading BPF program on device: veth12 and pinning the map
INFO[2023-07-18 12:23:24] [mapManager.go:163] [CreateBPFFS] created a directory /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc
INFO[2023-07-18 12:23:24] [mapManager.go:168] [CreateBPFFS] Created BPFFS mount point at /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: if_index for interface veth12 is 10
libbpf: Error in bpf_create_map_xattr(xsks_map):No error information(-524). Retrying without BTF.
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: bpf: Attach prog to ifindex 10
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: xsk map pinned to /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map
DEBU[2023-07-18 12:23:24] [poolManager.go:267] [Allocate] mapping /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map to /tmp/xsks_map
DEBU[2023-07-18 12:23:24] [poolManager.go:289] [Allocate] Container environment variables: {
"AFXDP_DEVICES": "veth12"
}
DP plugin on pod deletion:
DEBU[2023-07-18 12:23:24] [poolManager.go:220] [Allocate] Primary mode
DEBU[2023-07-18 12:23:24] [poolManager.go:232] [Allocate] Cycling state of device veth12
INFO[2023-07-18 12:23:24] [poolManager.go:250] [Allocate] Loading BPF program on device: veth12 and pinning the map
INFO[2023-07-18 12:23:24] [mapManager.go:163] [CreateBPFFS] created a directory /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc
INFO[2023-07-18 12:23:24] [mapManager.go:168] [CreateBPFFS] Created BPFFS mount point at /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: if_index for interface veth12 is 10
libbpf: Error in bpf_create_map_xattr(xsks_map):No error information(-524). Retrying without BTF.
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: bpf: Attach prog to ifindex 10
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: xsk map pinned to /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map
DEBU[2023-07-18 12:23:24] [poolManager.go:267] [Allocate] mapping /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map to /tmp/xsks_map
DEBU[2023-07-18 12:23:24] [poolManager.go:289] [Allocate] Container environment variables: {
"AFXDP_DEVICES": "veth12"
}
INFO[2023-07-18 12:27:16] [server.go:66] [DelNetDev] Looking up Map Manager for veth12
INFO[2023-07-18 12:27:16] [server.go:83] [DelNetDev] Map Manager found, deleting BPFFS for veth12
INFO[2023-07-18 12:27:16] [mapManager.go:293] [DeleteBPFFS] Deleted BPFFS mount point at /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc
INFO[2023-07-18 12:27:16] [server.go:90] [DelNetDev] Network interface veth12 deleted
I just spotted the
libbpf: can't get next link: Operation not permitted
this is not expected... but doesn't block this PR at least... Let me see if CAP_BPF is the issue here... it should be either CAP_BPF or unprivileged_bpf_disabled
...
TLDR ===> not a blocker for this PR Ok - I think we can ignore that warning... https://github.com/libbpf/libbpf/commit/8628610c322a it looks like it's a probe under the hood of libbpf and when bpf link support isn't detected it reverts back to netlink-based XDP prog... we indeed don't have permissions to make this bpf call is the pod is unprivileged... I did notice however that even CAP_BPF didn't provide enough privilege for this call, only when the pod is privileged is it able to make that call.
Hey @maryamtahhan, we've seen libbpf: can't get next link: Operation not permitted
before. It's pod privileges we think, running as root fixed it, but obviously that's not the solution.
ok so it's a probe for libbpf rather than breaking functionality? Or is it breaking functionality?
Is it something we could ask the DP to configure for us?
Kind of related: Client being created in #65 and we'll put a C wrapper on this once finalised.
Hey @maryamtahhan, we've seen
libbpf: can't get next link: Operation not permitted
before. It's pod privileges we think, running as root fixed it, but obviously that's not the solution.ok so it's a probe for libbpf rather than breaking functionality? Or is it breaking functionality?
Yeah - it's an internal probe under the hood of libbpf :( it doesn't break functionality from what I can see. CNDP can still successfully create the AF_XDP socket and doesn't fail.
Is it something we could ask the DP to configure for us?
I don't think so.
Kind of related: Client being created in #65 and we'll put a C wrapper on this once finalised.
Cool, I will check it out.
capabilities don't get added to a non root users shell. You need to use setcap on the specific binary that needs the capability in the dockerfile (you cant do it in a running container). Then you also need to have the matching capability in the pod spec. you can easily check the capabilities of the current shell with capsh --print. getcap can be used to check specific capability of a binary
capabilities don't get added to a non root users shell. You need to use setcap on the specific binary that needs the capability in the dockerfile (you cant do it in a running container). Then you also need to have the matching capability in the pod spec. you can easily check the capabilities of the current shell with capsh --print. getcap can be used to check specific capability of a binary
sorry, what's the context here?
My mistake, I thought that you were running as a non root user from grays comment:
"libbpf: can't get next link: Operation not permitted before. It's pod privileges we think, running as root fixed it"
such as https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/test/e2e/pod-1c1d.yaml#L9 where you runAsUser: 1500
or that you had baked the non root user into the dockerfile.
I realise now that he meant unprivileged root user
I've rebased on main and tested in Kind... everything is working as expected.
Rebasing the previous PR to support bpf map pinning after Kind support was merged to main.
Going to close PR 59
Will transition from draft after some local testing.