Open Kaustubh2455 opened 2 years ago
@Kaustubh2455 Can you share your driver config details ? Also does it have 3 QAT devices in it (you will have to check with lspci | grep Co-p) . It looks like you have only 1 QAT device by looking into your performance.
@Yogaraj-Alamenda thanks for the reply. I am using the default driver configurations provided by qat. I tried all 4 multi_process_optimized multi_thread_event-driven_optimized multi_thread_optimized multi_process_event-driven_optimized present at QAT_Engine/qat/config/c6xx configurations provided by QAT. Do I need to make any changes to the configurations?
I do have three QAT devices and i copy the configuration for all three of them. lspci | grep Co-p 1a:00.0 Co-processor: Intel Corporation C62x Chipset QuickAssist Technology (rev 04) 1b:00.0 Co-processor: Intel Corporation C62x Chipset QuickAssist Technology (rev 04) 1c:00.0 Co-processor: Intel Corporation C62x Chipset QuickAssist Technology (rev 04)
hi @Yogaraj-Alamenda I looked into your comment about only having 1 QAT device and I found out by looking at fw_counters that only 1 of the devices receives the requests for the other two the fw_counters are zero. Not sure how to make them send/recieve requests. `# service qat_service status ● qat_service.service - LSB: modprobe the QAT modules, which loads dependant modules, before calling the user space utility to pass configuration parameters Loaded: loaded (/etc/init.d/qat_service; generated) Active: active (exited) since Mon 2022-09-19 03:16:47 PDT; 14min ago Docs: man:systemd-sysv-generator(8) Process: 27514 ExecStop=/etc/init.d/qat_service stop (code=exited, status=0/SUCCESS) Process: 27582 ExecStart=/etc/init.d/qat_service start (code=exited, status=0/SUCCESS)
Sep 19 03:16:46 compute1.lilac.local qat_service[27582]: Restarting all devices. Sep 19 03:16:46 compute1.lilac.local qat_service[27582]: Processing /etc/c6xx_dev0.conf Sep 19 03:16:46 compute1.lilac.local qat_service[27582]: Processing /etc/c6xx_dev1.conf Sep 19 03:16:47 compute1.lilac.local qat_service[27582]: Processing /etc/c6xx_dev2.conf Sep 19 03:16:47 compute1.lilac.local qat_service[27582]: Checking status of all devices. Sep 19 03:16:47 compute1.lilac.local qat_service[27582]: There is 3 QAT acceleration device(s) in the system: Sep 19 03:16:47 compute1.lilac.local qat_service[27582]: qat_dev0 - type: c6xx, inst_id: 0, node_id: 0, bsf: 0000:1a:00.0, #accel: 5 #engines: 10 state: up Sep 19 03:16:47 compute1.lilac.local qat_service[27582]: qat_dev1 - type: c6xx, inst_id: 1, node_id: 0, bsf: 0000:1b:00.0, #accel: 5 #engines: 10 state: up Sep 19 03:16:47 compute1.lilac.local qat_service[27582]: qat_dev2 - type: c6xx, inst_id: 2, node_id: 0, bsf: 0000:1c:00.0, #accel: 5 #engines: 10 state: up Sep 19 03:16:47 compute1.lilac.local systemd[1]: Started LSB: modprobe the QAT modules, which loads dependant modules, before calling the user space utility to pass configuration parameters.`
You will have to configure your driver config file accordingly to use all the 3 devices inline with number of process in the -multi option with speed command
For instance, Speed test command below with multi 3 option can achieve the target of 100K op/s utilizing all the 3 devices. ./openssl speed -engine qatengine -async_jobs 36 -multi 3 rsa2048
[SHIM] NumberCyInstances = 2 NumberDcInstances = 0 NumProcesses = 1 LimitDevAccess = 1
Hi @Yogaraj-Alamenda Thanks for the response.
I tried with the SHIM config for all three devices i.e. [SHIM] NumberCyInstances = 2 NumberDcInstances = 0 NumProcesses = 1 LimitDevAccess = 1
which replaced the existing config which was [SHIM] NumberCyInstances = 1 NumberDcInstances = 0 NumProcesses = 16 LimitDevAccess = 1 (with this config I tried using -multi 3 and still only one of the devices gets all the requests)
on changing the config and restarting the service I see these errors in the dmesgs which causes the hw initialisation to fail.
[ +0.107276] c6xx 0000:1a:00.0: Get core number failed with error -14 [ +0.006441] c6xx 0000:1a:00.0: Failed to create rings for cy [ +0.005723] c6xx 0000:1a:00.0: Failed to process user section SHIM [ +0.006354] c6xx 0000:1a:00.0: Failed to config device [ +0.006505] c6xx 0000:1b:00.0: Get core number failed with error -14 [ +0.006426] c6xx 0000:1b:00.0: Failed to create rings for cy [ +0.005705] c6xx 0000:1b:00.0: Failed to process user section SHIM [ +0.006257] c6xx 0000:1b:00.0: Failed to config device [ +0.005623] c6xx 0000:1c:00.0: Get core number failed with error -14 [ +0.006401] c6xx 0000:1c:00.0: Failed to create rings for cy [ +0.005695] c6xx 0000:1c:00.0: Failed to process user section SHIM [ +0.006237] c6xx 0000:1c:00.0: Failed to config device
do i need to add anything more in the config when i do NumberCyInstances = 2? I also tried changing NumberCyInstances to other values other than 1 and got the same error as before.
Sorry I missed to inform that you also need to add instance section like below for configuring 2 instance
[SHIM] NumberCyInstances = 2 NumberDcInstances = 0 NumProcesses = 1 LimitDevAccess = 1
Cy0Name = "UserCY0" Cy0IsPolled = 1
Cy0CoreAffinity = 0
Cy1Name = "UserCY1" Cy1IsPolled = 1
Cy1CoreAffinity = 1
Hi @Yogaraj-Alamenda thanks, with these changes I got all three qat devices working and got the 100k ops/sec number.
But the numbers that intel published were with a single core and without multi. specifically referring to table 2 in this doc https://01.org/sites/default/files/downloads/intelr-quickassist-technology/337003-001-intelquickassisttechnologyandopenssl-110.pdf.
Is there anything else that can be done to get the numbers published with a single core?
AFIK, We need atleast 3 cores to reach the 100K cps. Let me check internally to see if it was the case in the whitepaper as well.
hi @Yogaraj-Alamenda I have been trying to get the perf numbers for qat with Nginx and what I observe is that with 1c2t my numbers are similar to what has been published but as soon as I increase the number of cores and threads the performance seems to drop. These are the numbers that I see with RSA
I have been trying to reproduce the numbers published by intel on my setup. https://01.org/sites/default/files/downloads/intelr-quickassist-technology/337003-001-intelquickassisttechnologyandopenssl-110.pdf In the above document Table 2. Performance of RSA 2K with OpenSSL speed it is mentioned that sign/s is around 100k and verify/s is around 200k. The numbers I get for verify/s are similar but for sign/s I get around 27k and for ecdsap256 both sign/s and verify/s are lower than what is mentioned in the document. here's a document in which I have mentioned the steps I followed and the results.I have also added the software runs for both the cases. I am using c62x chipset with Quickassist. https://docs.google.com/document/d/15w_fKrlEkGoIcu1cHmPa2VAWXhnhoyKA5Y2MK--OJuE/edit?usp=sharing
Need help to figure out why the performance is low.