Closed kkurzacz-intel closed 1 month ago
I was told that software fallback with heartbeat is not supported in QAT2.0 driver v. 20.L.1.0.50-00003. So I should turn off qat_sw_fallback
in the nginx.conf.
So I changed that entry:
qat_sw_fallback off;
I did that, however there are still errors. When I start nginx it's okay. But once I run first request, I start getting following errors, even after cancelling request:
QAT Engine failed: POLL
Looks like we identified the issue reason. It happened because workers number was much greater than number of available HW QAT instances. As long as I understand, when HW QAT instances pool is exhausted, rest or nginx workers should receive SW QAT ones. But for some reason, it doesn't happen.
I have enabled QAT verbose debug logs, by adding --enable-qat_debug
to ./configure
of QAT engine. Therefore, QAT was logging everything to error.log
file of nginx. And we were able to spot lines that explains the POLL error:
[WARN][2332072.319774] PID [179324] Thread [7f0ae03ad740][e_qat.c:742:qat_engine_ctrl()] POLL failed as no instances are available
2024/01/04 16:17:42 [alert] 179324#0: QAT Engine failed: POLL
Temporary solution for now is to lower number of workers (worker_processes
in nginx.conf) to the number matching QAT HW instances. In my example, QAT driver conf (_/etc/4xxxdev0.conf), SHIM
section has following lines:
[SHIM]
NumberCyInstances = 1
NumberDcInstances = 1
NumProcesses = 32
LimitDevAccess = 1
NumberCyInstances
x NumProcesses
= 32. On 2 socket instance with 2 CPUs and QAT modules, we have 32 x 2 = 64. So 64 is maximum number of workers, for which there are enough HW QAT instances.
@kkurzacz-intel The issue is with qatengine when run with external polling where it is trying to poll an instance for heartbeat for the worker process that does not have qat_hw instance which should do qat_sw polling only. We will fix it in the qatengine.
That being said, in addition to the workaround you have mentioned, here is 2 other alternatives.
Please let us know if that works
The issue mentioned here is closed with the commit below in QAT Engine and relased in QAT Engine v1.6.0 https://github.com/intel/QAT_Engine/commit/3a1fca3138c96054721bebe19861b0cd6dc449af. Hence closing this
What is the problem
Async nginx with QAT configuration starts but is constantly logging the error:
System description
QAT configuration
Nginx configuration
What is working
Openssl is sort of working with QAT:
Additional information
I need to to run
export OPENSSL_ENGINES=/usr/local/ssl/lib64/engines-3
otherwise openssl can't find the engine