k8snetworkplumbingwg / sriov-network-device-plugin

SRIOV network device plugin for Kubernetes
Apache License 2.0
410 stars 177 forks source link

fix: generate a file per resourcePool #583

Closed souleb closed 3 months ago

souleb commented 3 months ago

fixes #576

This PR if implemented creates a cdiSpec file for each resourcePool. The file is removed first for each call to CreateCDISpecForPool. We calculate the digest of the cdiSpec to get a unique filename based on the resourcePool devices.

The manager itself makes sure to clean the directory in initServers() by calling cleanupCDISpecs() which will delete everything that matches sriov-dp-*

souleb commented 3 months ago

I have tested this with the following:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: nvidia.com/rdma_resource1
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}'
---
souleb@c-237-115-20-025:~/network-operator$ cat pod-tc1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: testpod1
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-net1
spec:
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        nvidia.com/rdma_resource1: '1'
      limits:
        nvidia.com/rdma_resource1: '1'

I get the expected result:

Name:             testpod1
Annotations:
                  k8s.v1.cni.cncf.io/networks: sriov-net1
                  k8s.v1.cni.cncf.io/networks-status:
                    [{
                        "name": "cilium",
                        "interface": "eth0",
                        "ips": [
                            "10.0.2.251"
                        ],
                        "mac": "9e:35:b3:cf:66:5e",
                        "default": true,
                        "dns": {}
                    },{
                        "name": "default/sriov-net1",
                        "interface": "net1",
                        "ips": [
                            "10.56.217.2"
                        ],
                        "mac": "ee:55:4d:84:c6:fa",
                        "dns": {},
                        "device-info": {
                            "type": "pci",
                            "version": "1.1.0",
                            "pci": {
                                "pci-address": "0000:b1:00.4",
                                "rdma-device": "mlx5_15"
                            }
                        }
                    }]
Status:           Running

One think to note is the created files:

<>:/$ sudo ls -l /var/run/cdi
-rw------- 1 root root 3623 Aug  6 15:45 sriov-dp-nvidia.com-net-pci_104c6xxx.yaml
-rw------- 1 root root 1855 Aug  6 15:45 sriov-dp-nvidia.com-net-pci_71627xxx.yaml

The resources discovery order is not idempotent, so the filenames change overtime. I don't think this is an issue though.

coveralls commented 3 months ago

Pull Request Test Coverage Report for Build 10510563475

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/cdi/cdi.go 0 6 0.0%
<!-- Total: 0 6 0.0% -->
Totals Coverage Status
Change from base Build 10353183797: 0.0%
Covered Lines: 2103
Relevant Lines: 2794

💛 - Coveralls
adrianchiris commented 3 months ago

@Eoghan1232 mind taking a look on this one ?

adrianchiris commented 3 months ago

merging this one ! @souleb appreciate the patience