intel / qatlib

Other
91 stars 34 forks source link

The numa allocation mechanism in version 27.09 is still questionable #95

Open Microcarft opened 6 days ago

Microcarft commented 6 days ago

I was very happy to see the new version update, and finally solved the numa problem that gave me a headache. I immediately uninstalled the 24.02 version of qatlib and qatmgr, re-pulled the 24.09 version and re-make compiled and installed. However, it was disappointing that after I recompiled the application based on the new qatlib static library, and restricted the application to the core of CPU1 and the numa node1 corresponding to CPU1 by using cgruop, the application still used the QAT#0 device, which is the QAT hardware accelerator of CPU0. My sysconfig/qat is configured as follows:

POLICY=1
ServicesEnabled=dc

Please tell me what is missing? Or is this a bug.

fionatrahe commented 6 days ago

Ah! Thanks for the feedback. That wasn’t the use-case we were solving for, but we will take it on board and investigate. In the meantime, would the following be a workaround for you?

Pick up one VF from each PF, so you have VFs on each socket, by setting POLICY=0 Use these to find the devices on the numaNode you’re interested in: CpaStatus cpaGetNumDevices (Cpa16U numDevices); CpaStatus cpaGetDeviceInfo (Cpa16U device, CpaDeviceInfo deviceInfo);

Although this will result in some VF resources being unused, that doesn’t necessarily limit performance, it depends on your use-case. If you run processes on all numaNodes then you can still use the full compute power of all devices.

Microcarft commented 6 days ago

Very happy to receive your reply! In fact, I only plan to run the application on a specific CPU (so it should only run on the NUMA node to which the specified CPU belongs, and also should only use the QAT hardware of the specified CPU). In the scenario of the above problem, I bind the application to CPU1 through cgroup, and the memory node is also the same, but the QAT hardware is still using CPU0. Although I know that if I want to run multiple applications in the future, sooner or later I will have to use all the PFs on the system. At this time, setting POLICY to 0 or a higher value will be more suitable for multi-process performance. However, in my current needs, I only hope that an application uses only one VF (that is, POLICY = 1), so that I can easily determine whether QATmgr allocates QAT#0 or QAT#1 VF to this application. In addition, I use pcm-accel -qat to view the usage of the two physical engines of QAT. pcm is a very useful performance monitoring tool, I think it is more intuitive and convenient. Looking forward to your reply, thank you~

Microcarft commented 6 days ago

Oops, sorry, I clicked the wrong button, this question is not answered.