intel / afxdp-plugins-for-kubernetes

Apache License 2.0
43 stars 15 forks source link

Feat kind support map pinning v3 #67

Closed maryamtahhan closed 11 months ago

maryamtahhan commented 1 year ago

Rebasing the previous PR to support bpf map pinning after Kind support was merged to main.

Going to close PR 59

Will transition from draft after some local testing.

maryamtahhan commented 1 year ago

Tested with a kind cluster make run-on-kind

configure the unprivileged_bpf_disabled kernel flag on the kind worker nodes

$ docker exec af-xdp-deployment-worker sysctl kernel.unprivileged_bpf_disabled=0
kernel.unprivileged_bpf_disabled = 0
$ docker exec af-xdp-deployment-worker2 sysctl kernel.unprivileged_bpf_disabled=0

Used the following NAD:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: afxdp-network
  annotations:
    k8s.v1.cni.cncf.io/resourceName: afxdp/myPool
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "afxdp",
      "mode": "primary",
      "logFile": "afxdp-cni.log",
      "logLevel": "debug",
      "dpSyncer": true,
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.1.0/24",
        "rangeStart": "192.168.1.200",
        "rangeEnd": "192.168.1.220",
        "routes": [
          { "dst": "0.0.0.0/0" }
        ],
        "gateway": "192.168.1.1"
      }
    }'

and the following pod spec:

 apiVersion: v1
kind: Pod
metadata:
  name: cndp-0-0
  annotations:
    k8s.v1.cni.cncf.io/networks: afxdp-network 
spec:
  containers:
    - name: cndp-0
      command: ["/bin/bash"]
      args: ["-c", "./jsonc_gen.sh -kp ; cndpfwd -c config.jsonc lb;"]
      image:  quay.io/mtahhan/cndp-map-pinning:latest 
      imagePullPolicy: IfNotPresent
      securityContext:
        capabilities:
          add:
            - NET_RAW
            - IPC_LOCK
      resources:
        requests:
          afxdp/myPool: '1'
        limits:
          afxdp/myPool: '1'

Also need to load the container image to the kind workers:

$ kind load --name af-xdp-deployment docker-image quay.io/mtahhan/cndp-map-pinning:latest 

Then creating and deleting the cndp pod - the logs of the Device plugin are updated accordingly with bpf map pinning messages/information

the cndp pod log itself should show:

**** PINNED_BPF_MAP is enabled
libbpf: can't get next link: Operation not permitted

*** CNDPFWD Forward Application, API: XSKDEV, Mode: Loopback, Burst Size: 256 
   Initial Thread ID    1 on lcore 1
   Forwarding Thread ID 28 on lcore 0

DP logs on pod creation:

DEBU[2023-07-18 12:23:24] [poolManager.go:220] [Allocate] Primary mode                                 
DEBU[2023-07-18 12:23:24] [poolManager.go:232] [Allocate] Cycling state of device veth12               
INFO[2023-07-18 12:23:24] [poolManager.go:250] [Allocate] Loading BPF program on device: veth12 and pinning the map 
INFO[2023-07-18 12:23:24] [mapManager.go:163] [CreateBPFFS] created a directory /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc 
INFO[2023-07-18 12:23:24] [mapManager.go:168] [CreateBPFFS] Created BPFFS mount point at /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc 
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: if_index for interface veth12 is 10 
libbpf: Error in bpf_create_map_xattr(xsks_map):No error information(-524). Retrying without BTF.
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: bpf: Attach prog to ifindex 10 
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: xsk map pinned to /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map 
DEBU[2023-07-18 12:23:24] [poolManager.go:267] [Allocate] mapping /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map to /tmp/xsks_map 
DEBU[2023-07-18 12:23:24] [poolManager.go:289] [Allocate] Container environment variables: {
  "AFXDP_DEVICES": "veth12"
} 

DP plugin on pod deletion:

DEBU[2023-07-18 12:23:24] [poolManager.go:220] [Allocate] Primary mode                                 
DEBU[2023-07-18 12:23:24] [poolManager.go:232] [Allocate] Cycling state of device veth12               
INFO[2023-07-18 12:23:24] [poolManager.go:250] [Allocate] Loading BPF program on device: veth12 and pinning the map 
INFO[2023-07-18 12:23:24] [mapManager.go:163] [CreateBPFFS] created a directory /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc 
INFO[2023-07-18 12:23:24] [mapManager.go:168] [CreateBPFFS] Created BPFFS mount point at /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc 
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: if_index for interface veth12 is 10 
libbpf: Error in bpf_create_map_xattr(xsks_map):No error information(-524). Retrying without BTF.
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: bpf: Attach prog to ifindex 10 
INFO[2023-07-18 12:23:24] [bpfWrapper.go:135] [Infof] Load_bpf_pin_xsk_map: xsk map pinned to /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map 
DEBU[2023-07-18 12:23:24] [poolManager.go:267] [Allocate] mapping /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc/xsks_map to /tmp/xsks_map 
DEBU[2023-07-18 12:23:24] [poolManager.go:289] [Allocate] Container environment variables: {
  "AFXDP_DEVICES": "veth12"
} 
INFO[2023-07-18 12:27:16] [server.go:66] [DelNetDev] Looking up Map Manager for veth12            
INFO[2023-07-18 12:27:16] [server.go:83] [DelNetDev] Map Manager found, deleting BPFFS for veth12 
INFO[2023-07-18 12:27:16] [mapManager.go:293] [DeleteBPFFS] Deleted BPFFS mount point at /var/run/afxdp_dp/afxdp-maps/6c1eab5d-e6ba-472a-8eff-ce90a45481bc 
INFO[2023-07-18 12:27:16] [server.go:90] [DelNetDev] Network interface veth12 deleted 
maryamtahhan commented 1 year ago

I just spotted the

libbpf: can't get next link: Operation not permitted

this is not expected... but doesn't block this PR at least... Let me see if CAP_BPF is the issue here... it should be either CAP_BPF or unprivileged_bpf_disabled...

maryamtahhan commented 1 year ago

TLDR ===> not a blocker for this PR Ok - I think we can ignore that warning... https://github.com/libbpf/libbpf/commit/8628610c322a it looks like it's a probe under the hood of libbpf and when bpf link support isn't detected it reverts back to netlink-based XDP prog... we indeed don't have permissions to make this bpf call is the pod is unprivileged... I did notice however that even CAP_BPF didn't provide enough privilege for this call, only when the pod is privileged is it able to make that call.

garyloug commented 1 year ago

Hey @maryamtahhan, we've seen libbpf: can't get next link: Operation not permitted before. It's pod privileges we think, running as root fixed it, but obviously that's not the solution.

ok so it's a probe for libbpf rather than breaking functionality? Or is it breaking functionality?

Is it something we could ask the DP to configure for us?

Kind of related: Client being created in #65 and we'll put a C wrapper on this once finalised.

maryamtahhan commented 1 year ago

Hey @maryamtahhan, we've seen libbpf: can't get next link: Operation not permitted before. It's pod privileges we think, running as root fixed it, but obviously that's not the solution.

ok so it's a probe for libbpf rather than breaking functionality? Or is it breaking functionality?

Yeah - it's an internal probe under the hood of libbpf :( it doesn't break functionality from what I can see. CNDP can still successfully create the AF_XDP socket and doesn't fail.

Is it something we could ask the DP to configure for us?

I don't think so.

Kind of related: Client being created in #65 and we'll put a C wrapper on this once finalised.

Cool, I will check it out.

johnoloughlin commented 1 year ago

capabilities don't get added to a non root users shell. You need to use setcap on the specific binary that needs the capability in the dockerfile (you cant do it in a running container). Then you also need to have the matching capability in the pod spec. you can easily check the capabilities of the current shell with capsh --print. getcap can be used to check specific capability of a binary

maryamtahhan commented 1 year ago

capabilities don't get added to a non root users shell. You need to use setcap on the specific binary that needs the capability in the dockerfile (you cant do it in a running container). Then you also need to have the matching capability in the pod spec. you can easily check the capabilities of the current shell with capsh --print. getcap can be used to check specific capability of a binary

sorry, what's the context here?

johnoloughlin commented 1 year ago

My mistake, I thought that you were running as a non root user from grays comment: "libbpf: can't get next link: Operation not permitted before. It's pod privileges we think, running as root fixed it" such as https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/test/e2e/pod-1c1d.yaml#L9 where you runAsUser: 1500 or that you had baked the non root user into the dockerfile. I realise now that he meant unprivileged root user

maryamtahhan commented 11 months ago

I've rebased on main and tested in Kind... everything is working as expected.