intel / qatlib

Other
89 stars 33 forks source link

"qat_service status" command doesn't show qat endpoint status after kubernetes qat plugin is installed #79

Open juggarnautss opened 4 months ago

juggarnautss commented 4 months ago

We are using intel Saffhire Rapids processor with integrated QAT processors. After OS installation/, we are configuring QAT config files /etc directory and then starting the qat service using "/etc/init.d/qat_service start".

QAT config files: sysadmin@controller-0:/var/log$ ls -lrt /etc | grep 4xxx -rw-r----- 1 root root 5315 Apr 16 10:39 4xxx_dev0.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev0.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev1.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev2.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev3.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev4.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev5.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev6.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev7.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev8.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev9.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev10.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev11.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev12.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev13.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev14.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev15.conf -rw-r----- 1 root root 5315 Apr 16 10:39 4xxx_dev1.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev16.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev17.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev18.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev19.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev20.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev21.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev22.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev23.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev24.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev25.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev26.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev27.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev28.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev29.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev30.conf -rw-r----- 1 root root 4383 Apr 16 10:39 4xxxvf_dev31.conf

QAT service status: sysadmin@controller-0:/var/log$ sudo /etc/init.d/qat_service status Checking status of all devices. There is 34 QAT acceleration device(s) in the system: qat_dev0 - type: 4xxx, inst_id: 0, node_id: 0, bsf: 0000:f3:00.0, #accel: 1 #engines: 9 state: up qat_dev1 - type: 4xxx, inst_id: 1, node_id: 0, bsf: 0000:f7:00.0, #accel: 1 #engines: 9 state: up qat_dev2 - type: 4xxxvf, inst_id: 0, node_id: 0, bsf: 0000:f3:00.1, #accel: 1 #engines: 1 state: up qat_dev3 - type: 4xxxvf, inst_id: 1, node_id: 0, bsf: 0000:f3:00.2, #accel: 1 #engines: 1 state: up qat_dev4 - type: 4xxxvf, inst_id: 2, node_id: 0, bsf: 0000:f3:00.3, #accel: 1 #engines: 1 state: up qat_dev5 - type: 4xxxvf, inst_id: 3, node_id: 0, bsf: 0000:f3:00.4, #accel: 1 #engines: 1 state: up qat_dev6 - type: 4xxxvf, inst_id: 4, node_id: 0, bsf: 0000:f3:00.5, #accel: 1 #engines: 1 state: up qat_dev7 - type: 4xxxvf, inst_id: 5, node_id: 0, bsf: 0000:f3:00.6, #accel: 1 #engines: 1 state: up qat_dev8 - type: 4xxxvf, inst_id: 6, node_id: 0, bsf: 0000:f3:00.7, #accel: 1 #engines: 1 state: up qat_dev9 - type: 4xxxvf, inst_id: 7, node_id: 0, bsf: 0000:f3:01.0, #accel: 1 #engines: 1 state: up qat_dev10 - type: 4xxxvf, inst_id: 8, node_id: 0, bsf: 0000:f3:01.1, #accel: 1 #engines: 1 state: up qat_dev11 - type: 4xxxvf, inst_id: 9, node_id: 0, bsf: 0000:f3:01.2, #accel: 1 #engines: 1 state: up qat_dev12 - type: 4xxxvf, inst_id: 10, node_id: 0, bsf: 0000:f3:01.3, #accel: 1 #engines: 1 state: up qat_dev13 - type: 4xxxvf, inst_id: 11, node_id: 0, bsf: 0000:f3:01.4, #accel: 1 #engines: 1 state: up qat_dev14 - type: 4xxxvf, inst_id: 12, node_id: 0, bsf: 0000:f3:01.5, #accel: 1 #engines: 1 state: up qat_dev15 - type: 4xxxvf, inst_id: 13, node_id: 0, bsf: 0000:f3:01.6, #accel: 1 #engines: 1 state: up qat_dev16 - type: 4xxxvf, inst_id: 14, node_id: 0, bsf: 0000:f3:01.7, #accel: 1 #engines: 1 state: up qat_dev17 - type: 4xxxvf, inst_id: 15, node_id: 0, bsf: 0000:f3:02.0, #accel: 1 #engines: 1 state: up qat_dev18 - type: 4xxxvf, inst_id: 16, node_id: 0, bsf: 0000:f7:00.1, #accel: 1 #engines: 1 state: up qat_dev19 - type: 4xxxvf, inst_id: 17, node_id: 0, bsf: 0000:f7:00.2, #accel: 1 #engines: 1 state: up qat_dev20 - type: 4xxxvf, inst_id: 18, node_id: 0, bsf: 0000:f7:00.3, #accel: 1 #engines: 1 state: up qat_dev21 - type: 4xxxvf, inst_id: 19, node_id: 0, bsf: 0000:f7:00.4, #accel: 1 #engines: 1 state: up qat_dev22 - type: 4xxxvf, inst_id: 20, node_id: 0, bsf: 0000:f7:00.5, #accel: 1 #engines: 1 state: up qat_dev23 - type: 4xxxvf, inst_id: 21, node_id: 0, bsf: 0000:f7:00.6, #accel: 1 #engines: 1 state: up qat_dev24 - type: 4xxxvf, inst_id: 22, node_id: 0, bsf: 0000:f7:00.7, #accel: 1 #engines: 1 state: up qat_dev25 - type: 4xxxvf, inst_id: 23, node_id: 0, bsf: 0000:f7:01.0, #accel: 1 #engines: 1 state: up qat_dev26 - type: 4xxxvf, inst_id: 24, node_id: 0, bsf: 0000:f7:01.1, #accel: 1 #engines: 1 state: up qat_dev27 - type: 4xxxvf, inst_id: 25, node_id: 0, bsf: 0000:f7:01.2, #accel: 1 #engines: 1 state: up qat_dev28 - type: 4xxxvf, inst_id: 26, node_id: 0, bsf: 0000:f7:01.3, #accel: 1 #engines: 1 state: up qat_dev29 - type: 4xxxvf, inst_id: 27, node_id: 0, bsf: 0000:f7:01.4, #accel: 1 #engines: 1 state: up qat_dev30 - type: 4xxxvf, inst_id: 28, node_id: 0, bsf: 0000:f7:01.5, #accel: 1 #engines: 1 state: up qat_dev31 - type: 4xxxvf, inst_id: 29, node_id: 0, bsf: 0000:f7:01.6, #accel: 1 #engines: 1 state: up qat_dev32 - type: 4xxxvf, inst_id: 30, node_id: 0, bsf: 0000:f7:01.7, #accel: 1 #engines: 1 state: up qat_dev33 - type: 4xxxvf, inst_id: 31, node_id: 0, bsf: 0000:f7:02.0, #accel: 1 #engines: 1 state: up

After then, we have used the helm charts to deploy kubernetes intel QAT plugin. Here we have not used initcontainer to provision the qat devices as we are doing after our os installation mentioned above. After installation we can see vf endpoints are exposed to kubernetes cluster.

helm repo add intel https://intel.github.io/helm-charts/ helm repo update helm install qat-device-plugin intel/intel-device-plugins-qat

After installation we can see vf endpoints are exposed to kubernetes cluster.

Node description: Capacity: cpu: 64 ephemeral-storage: 10218772Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 129160204Ki pods: 110 qat.intel.com/generic: 32 Allocatable: cpu: 62 ephemeral-storage: 9417620260 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 118817804Ki pods: 110 qat.intel.com/generic: 32

Observation: Now when we check the status of QAT endpoints using "qat_service status" command, the output only PF status and doesnt show any QAT VF endpoint status as it was showing before qat plugin installation.

sysadmin@controller-0:/var/log$ sudo /etc/init.d/qat_service status Checking status of all devices. There is 2 QAT acceleration device(s) in the system: qat_dev0 - type: 4xxx, inst_id: 0, node_id: 0, bsf: 0000:f3:00.0, #accel: 1 #engines: 9 state: up qat_dev1 - type: 4xxx, inst_id: 1, node_id: 0, bsf: 0000:f7:00.0, #accel: 1 #engines: 9 state: up

Need assistance on this behavior as we understood that QAT VF endpoints comes up after reading the configuration files present in /etc directory to preserve the config setting wrt to each VF end point. Hence, missing status of endpoints will the break configuration binding with each VF endpoint.

mythi commented 4 months ago

Need assistance on this behavior as we understood that QAT VF endpoints comes up after reading the configuration files present in /etc directory to preserve the config setting wrt to each VF end point. Hence, missing status of endpoints will the break configuration binding with each VF endpoint.

Can you clarify what is the configuration you're expecting to see? The Kubernetes QAT plugin only knows about VFs that are bound to vfio-pci. If it finds a VF that does not have vfio-pci, it does the job. I don't know about qat_service status but it probably only lists devices that have either 4xxx or 4xxxvf. When you deploy the QAT plugin, you don't have 4xxxvfs anymore.

Note that your driver setup if based on the OOT driver which isn't a supported setup neither for qatlib nor for the Kubernetes QAT plugin.

juggarnautss commented 1 month ago

Need assistance on this behavior as we understood that QAT VF endpoints comes up after reading the configuration files present in /etc directory to preserve the config setting wrt to each VF end point. Hence, missing status of endpoints will the break configuration binding with each VF endpoint.

Can you clarify what is the configuration you're expecting to see? The Kubernetes QAT plugin only knows about VFs that are bound to vfio-pci. If it finds a VF that does not have vfio-pci, it does the job. I don't know about qat_service status but it probably only lists devices that have either 4xxx or 4xxxvf. When you deploy the QAT plugin, you don't have 4xxxvfs anymore.

Note that your driver setup if based on the OOT driver which isn't a supported setup neither for qatlib nor for the Kubernetes QAT plugin.

@mythi Thank you for your response. Ok, that means it is expected that till the time K8s qat plugin is not installed , "qat_service" will show the status of 4XXX and 4XXXVF , but post qat plugin installation 4xxxxvf doesn't exist and hence we don't see the status. Would you please elaborate why they don't exist post qat plugin installation ?

mythi commented 1 month ago

Would you please elaborate why they don't exist post qat plugin installation ?

It was mentioned in my earlier comment: "The Kubernetes QAT plugin only knows about VFs that are bound to vfio-pci. If it finds a VF that does not have vfio-pci, it does the job."

A few things to be aware of: it looks you are using the out-of-tree driver stack. That is not applicable to qatlib and the k8s qat plugin setup. In addition, when using QAT in a k8s cluster, the host OS does not need qatlib / qat_service installed because the Helm setup supports the equivalent functionality.