Open lizj3624 opened 4 years ago
@lizj3624 Could you please try with he latest version of driver and Nginx. Please see if you can reproduce with latest of everything as similar type of crash is fixed in later releases of Nginx and driver.
Driver : 4.11 Nginx : v0.4.3 QAT Engine : v0.6.3 OpenSSL 1.1.1h
I updated to the latest version, nginx still produces similar type of crash when a large number of configurations are loaded,my nginx loads more than 20,000 nginx server blocks, each nginx worker takes up about 1.8g of memory. But nginx works normally when a small amount of configuration is loaded.
Version CentOS: 7.6 Driver:qat1.7.l.4.11.0-00001.tar.gz QAT_Engine: v0.6.3, https://github.com/intel/QAT_Engine asynch_mode_nginx: 1.18, https://github.com/intel/asynch_mode_nginx OpenSSL 1.1.1g
Build steps 1) build driver
cp -rf $QAT_ROOT/conf/qat-blacklist.conf /etc/modprobe.d
./configure --prefix=$QAT_DRIVER
make insall
2) build qat_engine
cd $ICP_ROOT/QAT_Engine
./autogen.sh
./configure --with-qat_dir=$ICP_ROOT --with-openssl_dir=$OPENSSL_ROOT --with-openssl_install_dir=$OPENSSL_INSTALL
make && make install
3) build nginx
nginx version: nginx/1.18.0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
built with OpenSSL 1.1.1g 21 Apr 2020
TLS SNI support enabled
configure arguments: --prefix=/my/servers/asynch-ngx-1.18 --with-http_ssl_module --with-http_stub_status_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-pcre=/root/my-qat/pcre-8.40 --with-pcre-jit --with-debug --with-pcre-opt='-g -Ofast -fPIC -m64 -march=native -fstack-protector-strong -D_FORTIFY_SOURCE=2' --add-dynamic-module=modules/nginx_qat_module --with-cc-opt=' -fPIC -DNGX_SECURE_MEM -I/my/servers/my-`openssl/include -Wno-error=deprecated-declarations' --with-ld-opt='-Wl,-rpath=/my/servers/my-openssl/lib -L/my/servers/my-openssl/lib'
OpenSSL Engine tested ok
./bin/openssl engine qatengine -t -vvv
(qatengine) Reference implementation of QAT crypto engine v0.6.3
[ available ]
ENABLE_EXTERNAL_POLLING: Enables the external polling interface to the engine.
(input flags): NO_INPUT
POLL: Polls the engine for any completed requests
(input flags): NO_INPUT
SET_INSTANCE_FOR_THREAD: Set instance to be used by this thread
(input flags): NUMERIC
GET_NUM_OP_RETRIES: Get number of retries
(input flags): NO_INPUT
SET_MAX_RETRY_COUNT: Set maximum retry count
(input flags): NUMERIC
SET_INTERNAL_POLL_INTERVAL: Set internal polling interval
(input flags): NUMERIC
GET_EXTERNAL_POLLING_FD: Returns non blocking fd for crypto engine
(input flags): NO_INPUT
ENABLE_EVENT_DRIVEN_POLLING_MODE: Set event driven polling mode
(input flags): NO_INPUT
GET_NUM_CRYPTO_INSTANCES: Get the number of crypto instances
(input flags): NO_INPUT
DISABLE_EVENT_DRIVEN_POLLING_MODE: Unset event driven polling mode
(input flags): NO_INPUT
SET_EPOLL_TIMEOUT: Set epoll_wait timeout
(input flags): NUMERIC
SET_CRYPTO_SMALL_PACKET_OFFLOAD_THRESHOLD: Set QAT small packet threshold
(input flags): STRING
ENABLE_INLINE_POLLING: Enables the inline polling mode.
(input flags): NO_INPUT
ENABLE_HEURISTIC_POLLING: Enable the heuristic polling mode
(input flags): NO_INPUT
GET_NUM_REQUESTS_IN_FLIGHT: Get the number of in-flight requests
(input flags): NUMERIC
INIT_ENGINE: Initializes the engine if not already initialized
(input flags): NO_INPUT
SET_CONFIGURATION_SECTION_NAME: Set the configuration section to use in QAT driver configuration file
(input flags): STRING
ENABLE_SW_FALLBACK: Enables the fallback to SW if the acceleration devices go offline
(input flags): NO_INPUT
HEARTBEAT_POLL: Check the acceleration devices are still functioning
(input flags): NO_INPUT
DISABLE_QAT_OFFLOAD: Perform crypto operations on core
(input flags): NO_INPUT
nginx.conf
worker_processes 16;
worker_cpu_affinity 01111111111111111100000000000000;
user root;
error_log logs/error.log error;
pid nginx.pid;
load_module modules/ngx_ssl_engine_qat_module.so;
events { use epoll; multi_accept on; worker_connections 102400; }
ssl_engine { use_engine qatengine; default_algorithms ALL; qat_engine { qat_offload_mode async; qat_notify_mode poll; qat_poll_mode heuristic; qat_sw_fallback on;
#qat_heuristic_poll_sym_threshold 24;
}
}
Nginx still crash when when a large number of configurations are loaded
```c
gdb sbin/nginx core.26511
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /my/servers/asynch-ngx-1.18/sbin/nginx...done.
[New LWP 26511]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal 6, Aborted.
#0 0x00007fc1dbfab207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 sssd-client-1.16.2-13.el7.x86_64 systemd-libs-219-78.el7_9.2.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007fc1dbfab207 in raise () from /lib64/libc.so.6
#1 0x00007fc1dbfac8f8 in abort () from /lib64/libc.so.6
#2 0x00007fc1dbfedd27 in __libc_message () from /lib64/libc.so.6
#3 0x00007fc1dc08c9e7 in __fortify_fail () from /lib64/libc.so.6
#4 0x00007fc1dc08ab62 in __chk_fail () from /lib64/libc.so.6
#5 0x00007fc1dc08c947 in __fdelt_warn () from /lib64/libc.so.6
#6 0x00007fc1db497c91 in adf_proxy_poll_event () from /my/servers/qat-src/QAT-1.7/build/libqat_s.so
#7 0x00007fc1db499b31 in icp_adf_poll_device_events () from /my/servers/qat-src/QAT-1.7/build/libqat_s.so
#8 0x00007fc1db495e75 in icp_sal_poll_device_events ()
at /my/servers/qat-src/QAT-1.7/quickassist/lookaside/access_layer/src/user/sal_user.c:305
#9 0x00007fc1db725005 in poll_heartbeat () at qat_polling.c:386
#10 0x00007fc1db7229a6 in qat_engine_ctrl (e=<optimized out>, cmd=<optimized out>, i=<optimized out>, p=0x7ffd1d03b55c,
f=<optimized out>) at e_qat.c:758
#11 0x00007fc1dc6af769 in ENGINE_ctrl_cmd () from /my/servers/my-openssl/lib/libcrypto.so.1.1
#12 0x00007fc1db95ba60 in qat_engine_heartbeat_poll (log=0x55f832f9d378)
at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:555
#13 qat_engine_heartbeat_poll_handler (ev=0x7fc1dbb5e6a0 <qat_engine_heartbeat_poll_event>)
at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:563
#14 0x000055f8312abd9e in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94
#15 0x000055f8312ab9f5 in ngx_process_events_and_timers (cycle=cycle@entry=0x55f832f9d360) at src/event/ngx_event.c:266
#16 0x000055f8312b51c8 in ngx_worker_process_cycle (cycle=cycle@entry=0x55f832f9d360, data=data@entry=0x2)
at src/os/unix/ngx_process_cycle.c:769
#17 0x000055f8312b36af in ngx_spawn_process (cycle=cycle@entry=0x55f832f9d360,
proc=proc@entry=0x55f8312b5170 <ngx_worker_process_cycle>, data=data@entry=0x2,
name=name@entry=0x55f831383b8b "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x000055f8312b4900 in ngx_start_worker_processes (cycle=cycle@entry=0x55f832f9d360, n=16, type=type@entry=-3)
at src/os/unix/ngx_process_cycle.c:363
#19 0x000055f8312b5eff in ngx_master_process_cycle (cycle=cycle@entry=0x55f832f9d360) at src/os/unix/ngx_process_cycle.c:133
#20 0x000055f8312890c5 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:389
(gdb)
@lizj3624 In nginx.conf, Could you please set 'multi_accept' to 'off' or remove it since the default is 'off' and see if you are able to reproduce the issue.
The same error still happen when set 'multi_accept' to 'off'
@lizj3624 Thanks, We will check and revert to you on this issue and let you know if any more information is needed. The back trace is pointing to QAT driver function which gets called only when qat_sw_fallback is "on" at Nginx conf.
nginx crash when qat is used my env: OS:CENTOS7.6 Driver : 4.7.0. OpenSSL : 1.1.1c QAT Engine v0.5.42 nginx-1.15.8,applied the patch of asynch_mode_nginx:https://github.com/intel/asynch_mode_nginx
crash info: