k8snetworkplumbingwg / sriov-cni

DPDK & SR-IOV CNI plugin
Apache License 2.0
307 stars 146 forks source link

Skip setting VLAN ID when VLAN is 0 - breaks on VF trunk policy #291

Closed RefluxMeds closed 5 months ago

RefluxMeds commented 7 months ago

What issue would you like to bring attention to?

When using trunk policy on VF level, the sriov-cni fails allocating the VF to the Pod.

root@worker:~# for VF in $(ls -l /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk | awk '{print $9}'); do echo add 1082 1082 > $VF; done
--- --- ---
$ kubectl create -f /home/deployment.yaml
$ kubectl -n test1 describe pod test-router-69945bd84f-5826r
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               10s   default-scheduler  Successfully assigned test1/test-router-69945bd84f-5826r to worker
  Warning  NoNetworkFound          9s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          9s    multus             Add eth0 [192.168.199.199/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  9s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8be0ea041e5f00ec8cd4fe2756fc85b23ae5d691ceb4c1fd51afc9229ca6566a": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          8s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          8s    multus             Add eth0 [192.168.199.245/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  8s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "99d05c54d8c428663e99c72716b8fc525c134286a90f97bd23d584bf471a4e40": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          7s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          7s    multus             Add eth0 [192.168.199.250/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  7s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a89824b786e6cb41f6dee80931d5394ad55dc32939e4647a9ad56d418aa934a9": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
...
...

What is the impact of this issue?

  1. Cannot provision a VF interface inside Pod network namespace.
  2. Reduced capabilities of NIC, not able to use all features.

Do you have a proposed response or remediation for the issue?

Either skip applying VLAN config when VLAN==0 in https://github.com/k8snetworkplumbingwg/sriov-cni/blob/master/pkg/sriov/sriov.go#L226 or Add additional configurable field for sriov-cni which would skip VLAN config when e.g. VFtrunk=true

If necessary I can post a longer post explaining all details of this.

RefluxMeds commented 7 months ago

I see some work was done in: https://github.com/k8snetworkplumbingwg/sriov-cni/issues/133. The issue contained a proposal for configuring sriov-cni. However, I do not demand or need a configuration option, simply make sriov-cni stop trying to apply VLAN to VF when VLAN ID is 0.

SchSeba commented 7 months ago

Hi @RefluxMeds thanks for the input yes please if you can explain about more the use-case this will be great!

is the trunk with allow vlan configure a driver specific?

RefluxMeds commented 7 months ago

Introduction

Hello @SchSeba !

We're trying to solve a specific security issue in our infrastructure - prohibiting Pod workloads and other host processes from provisioning any VLAN ID they want on assigned virtual functions. This issue stems from our network fabric setup, but I cannot get into details here as it would be too revealing. Also this feature would resolve our platform compliance with the company-wide security policy, so I am really really interested in pursuing it.

We are using Mellanox ConnectX-6 (Lx/Dx) and ConnectX-5 NIC on all our servers. These NICs provide a really cool feature we would like to use, however this is not supported with the current implementation of sriov-cni.

Issue description

To describe the issue, I will post a somewhat lengthy explainer on how we got to here. I apologize in advance if it is too long.

Let us take we have two physical functions (PF), with 8 virtual functions (VF) per PF:

4: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio state UP mode DEFAULT group default qlen 1000
    link/ether b8:3f:d2:3c:6c:78 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio state UP mode DEFAULT group default qlen 1000
    link/ether b8:3f:d2:3c:6c:79 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

These VFs can be consumed by Pod workloads using the sriov-cni. To consume a VF I create a NetworkAttachmentDefinition that has a VLAN ID defined:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: mellanox.com/mellanox_mlx5_dpdk_right
  name: mlnx-right
  namespace: test1
spec:
  config: '{ "cniVersion": "0.3.1", "type": "sriov", "name": "sriov-mlnx-connectx5",
    "spoofchk": "off", "trust": "on", "vlan": 1082 }'

In the Deployment spec, I refer to this NetworkAttachmentDefinition to configure the VF and move it into the Pod network namespace:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test-router
  name: test-router
  namespace: test1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-router
  template:
    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: '[ { "name": "mlnx-right", "namespace": "test1" } ]'
      labels:
        app: test-router
    spec:
      nodeSelector:
        kubernetes.io/hostname: my-worker
      containers:
      - image: docker.io/mylinuximage:latest
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        name: ubuntu
        resources:
          limits:
            cpu: 1000m
            mellanox.com/mellanox_mlx5_dpdk_right: "1"
            memory: 4Gi
          requests:
            cpu: 300m
            mellanox.com/mellanox_mlx5_dpdk_right: "1"
            memory: 1Gi

The Pod gets spawned up with allocated VF, I can assign IP and ping gateway for that VLAN:

$ kubectl -n test1 exec -it test-router-69945bd84f-cx48d -- bash
root@test-router-69945bd84f-cx48d:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if3966: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:40:ad:08:a8:e4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.214/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5440:adff:fe08:a8e4/64 scope link
       valid_lft forever preferred_lft forever
26: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b6:d7:f9:ac:c9:76 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b4d7:f9ff:feac:c976/64 scope link
       valid_lft forever preferred_lft forever

root@test-router-69945bd84f-cx48d:~# ip addr add 172.21.0.52/28 brd 172.21.0.63 dev net1
root@test-router-69945bd84f-cx48d:~# ping -I net1 172.21.0.49
PING 172.21.0.49 (172.21.0.49) from 172.21.0.52 net1: 56(84) bytes of data.
64 bytes from 172.21.0.49: icmp_seq=1 ttl=64 time=12.6 ms
64 bytes from 172.21.0.49: icmp_seq=2 ttl=64 time=2.38 ms
64 bytes from 172.21.0.49: icmp_seq=3 ttl=64 time=0.930 ms
64 bytes from 172.21.0.49: icmp_seq=4 ttl=64 time=0.916 ms
^C
--- 172.21.0.49 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3016ms
rtt min/avg/max/mdev = 0.916/4.221/12.660/4.908 ms

From the host side we can see that one VF is really assigned and all is well:

5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio state UP mode DEFAULT group default qlen 1000
    link/ether b8:3f:d2:3c:6c:79 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 1082, spoof checking off, link-state auto, trust on, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

This is a standard way that you would consume a virtual function. However, this configuration would match what Nvidia calls VST or VLAN Switch Tagging. This does not fit our use case, as VFs have their VLANs configurable both by NetworkAttachmentDefinition i.e. sriov-cni and manually using NETLINK:

root@worker:~# ip link set dev ens3f1 vf 3 vlan 3000
root@worker:~# ip link set dev ens3f1 vf 3 trust on
root@worker:~# ip link | grep "ens3f1:" -A9
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc prio state UP mode DEFAULT group default qlen 1000
    link/ether b8:3f:d2:3c:6c:79 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 3000, spoof checking off, link-state auto, trust on, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 1082, spoof checking off, link-state auto, trust on, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

The other way to consume a VF would be to treat it like a real trunk port and delegate VLAN tagging responsibilities into the container itself, while imposing an administrative trunk policy on the assigned VF. This configuration would match what Nvidia calls VGT or VLAN Guest Tagging.

To configure this, we recreate the NetworkAttachmentDefinition without the VLAN ID and restart the Pod. But now we have to create a VLAN tag on the VF through the Pod in order to ping the gateway:

$ kubectl -n test1 exec -it test-router-69945bd84f-tb424 -- bash
root@test-router-69945bd84f-tb424:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if3967: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:f6:89:13:cf:2d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.234/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a0f6:89ff:fe13:cf2d/64 scope link
       valid_lft forever preferred_lft forever
27: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 5e:46:16:56:e3:b9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5c46:16ff:fe56:e3b9/64 scope link
       valid_lft forever preferred_lft forever

root@test-router-69945bd84f-tb424:~# ip link add link net1 name net1.1082 type vlan id 1082
root@test-router-69945bd84f-tb424:~# ip addr add 172.21.0.52/28 brd 172.21.0.63 dev net1.1082
root@test-router-69945bd84f-tb424:~# ip link set dev net1.1082 up
root@test-router-69945bd84f-tb424:~# ping -I net1.1082 172.21.0.49
root@test-router-69945bd84f-tb424:~# ping -I net1.1082 172.21.0.49
PING 172.21.0.49 (172.21.0.49) from 172.21.0.52 net1.1082: 56(84) bytes of data.
64 bytes from 172.21.0.49: icmp_seq=1 ttl=64 time=0.820 ms
64 bytes from 172.21.0.49: icmp_seq=2 ttl=64 time=0.907 ms
64 bytes from 172.21.0.49: icmp_seq=3 ttl=64 time=0.973 ms
64 bytes from 172.21.0.49: icmp_seq=4 ttl=64 time=1.21 ms
^C
--- 172.21.0.49 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3044ms
rtt min/avg/max/mdev = 0.820/0.979/1.219/0.153 ms

This is where VGT+ feature comes in. VGT+ is an advanced mode of virtual guest tagging, in which a VF is allowed to tag its own packets, but is still subject to an administrative VLAN trunk policy. To configure the VF trunk policy on both PFs we run a command that configures it:

root@worker:~# for VF in $(ls -l /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk | awk '{print $9}'); do echo add 1070 1081 > $VF; done
root@worker:~# cat /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081

The guest tagged traffic with VLAN ID 1082 will not be sent or received by the VF:

root@test-router-69945bd84f-tb424:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if3967: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:f6:89:13:cf:2d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.234/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a0f6:89ff:fe13:cf2d/64 scope link
       valid_lft forever preferred_lft forever
27: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 5e:46:16:56:e3:b9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5c46:16ff:fe56:e3b9/64 scope link
       valid_lft forever preferred_lft forever

root@test-router-69945bd84f-tb424:~# ip link add link net1 name net1.1082 type vlan id 1082
root@test-router-69945bd84f-tb424:~# ip addr add 172.21.0.52/28 brd 172.21.0.63 dev net1.1082
root@test-router-69945bd84f-tb424:~# ip link set dev net1.1082 up

root@test-router-69945bd84f-tb424:~# ping -I net1.1082 172.21.0.49
PING 172.21.0.49 (172.21.0.49) from 172.21.0.52 net1.1082: 56(84) bytes of data.
^C
--- 172.21.0.49 ping statistics ---
16 packets transmitted, 0 received, 100% packet loss, time 15357ms
pipe 3

We will add an additional allowed VLAN on virtual function trunks and we will configure the same VF with another VLAN tag which is already allowed, to demonstrate that the policy indeed works:

root@worker:~# for VF in $(ls -l /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk | awk '{print $9}'); do echo add 1082 1082 > $VF; done
root@worker:~# cat /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082
--- --- --- --- --- --- --- --- ---
--- --- --- --- --- --- --- --- ---
root@test-router-69945bd84f-tb424:~# ip link add link net1 name net1.1082 type vlan id 1082
root@test-router-69945bd84f-tb424:~# ip addr add 172.21.0.52/28 dev net1.1082
root@test-router-69945bd84f-tb424:~# ip link set dev net1.1082 up
root@test-router-69945bd84f-tb424:~# ip link add link net1 name net1.1081 type vlan id 1081
root@test-router-69945bd84f-tb424:~# ip addr add 172.21.0.36/28 dev net1.1081
root@test-router-69945bd84f-tb424:~# ip link set dev net1.1081 up
root@test-router-69945bd84f-tb424:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if3967: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:f6:89:13:cf:2d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.234/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a0f6:89ff:fe13:cf2d/64 scope link
       valid_lft forever preferred_lft forever
8: net1.1082@net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:46:16:56:e3:b9 brd ff:ff:ff:ff:ff:ff
    inet 172.21.0.52/28 scope global net1.1082
       valid_lft forever preferred_lft forever
    inet6 fe80::5c46:16ff:fe56:e3b9/64 scope link
       valid_lft forever preferred_lft forever
9: net1.1081@net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:46:16:56:e3:b9 brd ff:ff:ff:ff:ff:ff
    inet 172.21.0.36/28 scope global net1.1081
       valid_lft forever preferred_lft forever
    inet6 fe80::5c46:16ff:fe56:e3b9/64 scope link
       valid_lft forever preferred_lft forever
27: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 5e:46:16:56:e3:b9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5c46:16ff:fe56:e3b9/64 scope link
       valid_lft forever preferred_lft forever

root@test-router-69945bd84f-tb424:~# ping -c4 -I net1.1082 172.21.0.49
PING 172.21.0.49 (172.21.0.49) from 172.21.0.52 net1.1082: 56(84) bytes of data.
64 bytes from 172.21.0.49: icmp_seq=1 ttl=64 time=12.4 ms
64 bytes from 172.21.0.49: icmp_seq=2 ttl=64 time=0.931 ms
64 bytes from 172.21.0.49: icmp_seq=3 ttl=64 time=0.871 ms
64 bytes from 172.21.0.49: icmp_seq=4 ttl=64 time=0.761 ms

--- 172.21.0.49 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3054ms
rtt min/avg/max/mdev = 0.761/3.750/12.440/5.017 ms

root@test-router-69945bd84f-tb424:~# ping -c4 -I net1.1081 172.21.0.33
PING 172.21.0.33 (172.21.0.33) from 172.21.0.36 net1.1081: 56(84) bytes of data.
64 bytes from 172.21.0.33: icmp_seq=1 ttl=64 time=23.7 ms
64 bytes from 172.21.0.33: icmp_seq=2 ttl=64 time=0.777 ms
64 bytes from 172.21.0.33: icmp_seq=3 ttl=64 time=0.822 ms
64 bytes from 172.21.0.33: icmp_seq=4 ttl=64 time=31.3 ms

--- 172.21.0.33 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3054ms
rtt min/avg/max/mdev = 0.777/14.177/31.378/13.648 ms

However, we have issues with this approach if the VF is already assigned to a Pod. When the 802.1Q trunk policy is changed all tagged interfaces inside the Pod have to be recreated for new trunk policy to apply. This means that we either stop all traffic on that interface and recreate it or we must set the trunk policies in advance (before application deployment).

Setting trunk policy before provisioning a VF has undesired consequences:

root@worker:~# for VF in $(ls -l /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk | awk '{print $9}'); do echo add 1082 1082 > $VF; done
--- --- ---
$ kubectl create -f /home/deployment.yaml
$ kubectl -n test1 describe pod test-router-69945bd84f-5826r
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               10s   default-scheduler  Successfully assigned test1/test-router-69945bd84f-5826r to worker
  Warning  NoNetworkFound          9s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          9s    multus             Add eth0 [192.168.199.199/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  9s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8be0ea041e5f00ec8cd4fe2756fc85b23ae5d691ceb4c1fd51afc9229ca6566a": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          8s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          8s    multus             Add eth0 [192.168.199.245/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  8s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "99d05c54d8c428663e99c72716b8fc525c134286a90f97bd23d584bf471a4e40": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          7s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          7s    multus             Add eth0 [192.168.199.250/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  7s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a89824b786e6cb41f6dee80931d5394ad55dc32939e4647a9ad56d418aa934a9": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          6s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          6s    multus             Add eth0 [192.168.199.197/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  6s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "51687216f7597bc2ad82e99c417c5337e847c6d11b9a4505a93d90aacc9344c7": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          5s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          5s    multus             Add eth0 [192.168.199.203/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  5s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4ea6cbf7c5338d2677c45b09af896be704887eaac82f5006c77ea42916f6247c": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          4s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          4s    multus             Add eth0 [192.168.199.208/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  4s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0259497093460581de8061b1a5002a8ee8e6241e8365c3c87b75d7bf228327b4": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          3s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          3s    multus             Add eth0 [192.168.199.229/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  3s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "de1c1c1592364e813842b5717853dfc708a37277aac87581fae3e30779937814": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          2s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          2s    multus             Add eth0 [192.168.199.235/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  2s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f56958ff6acb07533ea2f36a4e809e9198d54fe2eacc71d0ac1a92adc824bb8b": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          1s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found
  Normal   AddedInterface          1s    multus             Add eth0 [192.168.199.253/32] from k8s-pod-network
  Warning  FailedCreatePodSandBox  0s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "2e41362ccdf0416b6f684299070cd2a8565b848fe63f6d70620d89b053f4c5a5": plugin type="multus" name="multus-cni-network" failed (add): [test1/test-router-69945bd84f-5826r:sriov-mlnx-connectx5]: error adding container to network "sriov-mlnx-connectx5": SRIOV-CNI failed to configure VF "failed to set vf 5 vlan: operation not permitted"
  Warning  NoNetworkFound          0s    multus             cannot find a network-attachment-definition (k8s-pod-network) in namespace (kube-system): network-attachment-definitions.k8s.cni.cncf.io "k8s-pod-network" not found

This is true also for NETLINK commands:

root@worker:~# ip link set dev ens3f1 vf 3 vlan 0
RTNETLINK answers: Operation not permitted

Be mindful that we did not specify the VLAN ID in the NetworkAttachmentDefinition. Inside the SR-IOV configuration reference the VLAN tag is set to 0 by default (disabled). We believe that this being set to 0 does not prevent the sriov-cni from running this and this method trying to set 0 for VLAN tag and then failing, instead of skipping VLAN assignment and simply performing other operations (like changing network namespace of VF).

Why do we think that skipping VLAN assignment when it is 0 would solve the issue?

We tested this scenario manually by imposing an administrative trunk policy on all VFs, then we moved the VF interface into the Pod network namespace and provisioned a forbidden VLAN and an allowed VLAN. Firstly let us configure the administrative trunk policy on the VFs:

root@worker:~# for VF in $(ls -l /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk | awk '{print $9}'); do echo add 1070 1081 > $VF; done
root@worker:~# cat /sys/class/net/ens3f{0..1}/device/sriov/{0..7}/trunk
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081
Allowed 802.1Q VLANs: 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081

Secondly, we can create a Pod but this time without the SR-IOV NetworkAttachmentDefinition. This is because we will manually assign the virtual function interface into the Pod network namespace:

$ kubectl -n test1 exec -it test-router-875898ffb-k6g25 -- bash
root@test-router-875898ffb-k6g25:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if5511: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:33:bf:6b:41:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.208/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7c33:bfff:fe6b:41a9/64 scope link
       valid_lft forever preferred_lft forever

Let us go back on the host and try to find this network namespace.

root@worker:~# ip netns list
cni-cdd833b3-1de0-8abd-e4d3-d4daa5cb1e62 (id: 13)
cni-f69bf47a-0950-361f-db8b-c1e485d34b6a (id: 3)
cni-4382825b-a464-fd6c-400c-1c5de6b0683b (id: 0)
cni-27f1a091-e6c7-dbe4-341d-dd9cae23770f (id: 12)
cni-4d7f0916-3e9e-e2f9-1f34-41f0a2630373 (id: 9)
cni-c68fc5b8-2f15-6b8f-8ba3-7c8421301974 (id: 11)
cni-2874a9fa-ac9a-2271-2da4-336164e88248 (id: 10)
cni-8cd024a2-6a0e-7761-8199-e64c217d8d17 (id: 8)
cni-4fc915e4-2931-a6a0-12d3-3de92c0f6377 (id: 7)
cni-8da838ce-80a8-9e82-b747-f72df1f472a2 (id: 6)
cni-cbaad5ab-0609-74fe-2288-a1d52e780f3c (id: 5)
cni-d8c780b4-35d2-c331-c2b4-ebe1392a3774 (id: 4)
cni-7e77f723-caa5-ed63-7240-26c6b141b407 (id: 2)
cni-978bb2af-d6d7-82e8-6752-e3aef9a0b021 (id: 1)

root@worker:~# ip netns exec cni-cdd833b3-1de0-8abd-e4d3-d4daa5cb1e62 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if5511: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:33:bf:6b:41:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.208/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7c33:bfff:fe6b:41a9/64 scope link
       valid_lft forever preferred_lft forever

We found the correct network namespace. Now we can move any VF interface into it; the VF interfaces are in format ens3f[0-1]v[0-7]. Lets take ens3f1v5 for this example:

root@worker:~# ip link set ens3f1v5 netns cni-cdd833b3-1de0-8abd-e4d3-d4daa5cb1e62
root@worker:~# ip netns exec cni-cdd833b3-1de0-8abd-e4d3-d4daa5cb1e62 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if5511: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:33:bf:6b:41:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.208/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7c33:bfff:fe6b:41a9/64 scope link
       valid_lft forever preferred_lft forever
25: ens3f1v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 3a:97:8b:03:e1:da brd ff:ff:ff:ff:ff:ff

Perfect! We snatched an VF interface. Lets configure it as in the previous examples from inside the Pod:

root@test-router-875898ffb-k6g25:~# ip link add link ens3f1v5 name ens3f1v5.1082 type vlan id 1082
root@test-router-875898ffb-k6g25:~# ip addr add 172.21.0.52/28 dev ens3f1v5.1082
root@test-router-875898ffb-k6g25:~# ip link add link ens3f1v5 name ens3f1v5.1081 type vlan id 1081
root@test-router-875898ffb-k6g25:~# ip addr add 172.21.0.36/28 dev ens3f1v5.1081
root@test-router-875898ffb-k6g25:~# ip link set dev ens3f1v5 up
root@test-router-875898ffb-k6g25:~# ip link set dev ens3f1v5.1082 up
root@test-router-875898ffb-k6g25:~# ip link set dev ens3f1v5.1081 up
root@test-router-875898ffb-k6g25:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if5511: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:33:bf:6b:41:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.208/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7c33:bfff:fe6b:41a9/64 scope link
       valid_lft forever preferred_lft forever
4: ens3f1v5.1082@ens3f1v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 3a:97:8b:03:e1:da brd ff:ff:ff:ff:ff:ff
    inet 172.21.0.52/28 scope global ens3f1v5.1082
       valid_lft forever preferred_lft forever
    inet6 fe80::3897:8bff:fe03:e1da/64 scope link
       valid_lft forever preferred_lft forever
5: ens3f1v5.1081@ens3f1v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 3a:97:8b:03:e1:da brd ff:ff:ff:ff:ff:ff
    inet 172.21.0.36/28 scope global ens3f1v5.1081
       valid_lft forever preferred_lft forever
    inet6 fe80::3897:8bff:fe03:e1da/64 scope link
       valid_lft forever preferred_lft forever
25: ens3f1v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3a:97:8b:03:e1:da brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3897:8bff:fe03:e1da/64 scope link
       valid_lft forever preferred_lft forever

The final test is to perform a ping from these interfaces to confirm that our theory is indeed correct:

root@test-router-875898ffb-k6g25:~# ping -c4 -I ens3f1v5.1082 172.21.0.49
PING 172.21.0.49 (172.21.0.49) from 172.21.0.52 ens3f1v5.1082: 56(84) bytes of data.

--- 172.21.0.49 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3049ms
pipe 4

root@test-router-875898ffb-k6g25:~# ping -c4 -I ens3f1v5.1081 172.21.0.33
PING 172.21.0.33 (172.21.0.33) from 172.21.0.36 ens3f1v5.1081: 56(84) bytes of data.
64 bytes from 172.21.0.33: icmp_seq=1 ttl=64 time=14.7 ms
64 bytes from 172.21.0.33: icmp_seq=2 ttl=64 time=9.52 ms
64 bytes from 172.21.0.33: icmp_seq=3 ttl=64 time=0.868 ms
64 bytes from 172.21.0.33: icmp_seq=4 ttl=64 time=0.868 ms

--- 172.21.0.33 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3033ms
rtt min/avg/max/mdev = 0.868/6.494/14.712/5.917 ms

To cleanup we delete the VLAN interfaces and return the VF interface into the root network namespace:

root@worker:~# ip netns exec cni-cdd833b3-1de0-8abd-e4d3-d4daa5cb1e62 ip link set ens3f1v5 netns 1
root@worker:~# ip netns exec cni-cdd833b3-1de0-8abd-e4d3-d4daa5cb1e62 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if5511: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:33:bf:6b:41:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.199.208/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7c33:bfff:fe6b:41a9/64 scope link
       valid_lft forever preferred_lft forever

Proposed remediation for the issue / solution time!

Please provide guidance on this matter!

SchSeba commented 6 months ago

@mlguerrero12 can you take a look on this one and let me know :)

mlguerrero12 commented 6 months ago

@SchSeba, I think tt's a valid request.

We assign vlan 0 when the vlan parameter is not set. We could (and should) change this. The idea is to not set vlan 0 when the vlan parameter is not set and set vlan 0 when the config explicitly has the vlan parameter equal to 0.

For both cases, we will continue having the restriction of not allowing qos or vlan proto when vlan parameter is not set or 0.

I'll work on this.

mlguerrero12 commented 5 months ago

@RefluxMeds, fix is in #296. Try to test it from your side please.

RefluxMeds commented 5 months ago

Hi @mlguerrero12 , I'm blocked with activities until next week. I'll dedicate some time next week to test out your changes! Thank you so much!