intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
398 stars 127 forks source link

nginx crash when qat is used #175

Open lizj3624 opened 3 years ago

lizj3624 commented 3 years ago

nginx crash when qat is used my env: OS:CENTOS7.6 Driver : 4.7.0. OpenSSL : 1.1.1c QAT Engine v0.5.42 nginx-1.15.8,applied the patch of asynch_mode_nginx:https://github.com/intel/asynch_mode_nginx

crash info:

[New LWP 12825]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `nginx: worker process                                                         '.
Program terminated with signal 6, Aborted.
#0  0x00007fd051f3e207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 sssd-client-1.16.2-13.el7.x86_64 systemd-libs-219-78.el7_9.2.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007fd051f3e207 in raise () from /lib64/libc.so.6
#1  0x00007fd051f3f8f8 in abort () from /lib64/libc.so.6
#2  0x00007fd051f80d27 in __libc_message () from /lib64/libc.so.6
#3  0x00007fd05201f9e7 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007fd05201db62 in __chk_fail () from /lib64/libc.so.6
#5  0x00007fd05201f947 in __fdelt_warn () from /lib64/libc.so.6
#6  0x00007fd051215fd1 in adf_proxy_poll_event () from my-qat/qat-src/QAT-1.7/build/libqat_s.so
#7  0x00007fd051217da1 in icp_adf_poll_device_events () from my-qat/qat-src/QAT-1.7/build/libqat_s.so
#8  0x00007fd051214235 in icp_sal_poll_device_events () at my-qat/qat-src/QAT-1.7/quickassist/lookaside/access_layer/src/user/sal_user.c:289
#9  0x00007fd0514a35d5 in poll_heartbeat () at qat_polling.c:357
#10 0x00007fd0514a2906 in qat_engine_ctrl (e=<optimized out>, cmd=<optimized out>, i=<optimized out>, p=0x7ffcbf20e08c, f=<optimized out>) at qat_init.c:899
#11 0x00007fd05242c769 in ENGINE_ctrl_cmd () from my-qat/openssl_jfe/lib/libcrypto.so.1.1
#12 0x00007fd0516d9670 in qat_engine_heartbeat_poll (log=0x556346cab298) at /usr/src/debug/jfe-2.3-65.e7895ad/thirdparty/nginx_qat_module/ngx_ssl_engine_qat_module.c:533
#13 qat_engine_heartbeat_poll_handler (ev=0x7fd0518db620 <qat_engine_heartbeat_poll_event>)
    at /usr/src/debug/jfe-2.3-65.e7895ad/thirdparty/nginx_qat_module/ngx_ssl_engine_qat_module.c:541
#14 0x00005563460678bd in ngx_event_expire_timers () at src/event/ngx_event_timer.c:97
#15 0x00005563460674c6 in ngx_process_events_and_timers (cycle=cycle@entry=0x556346cab280) at src/event/ngx_event.c:272
#16 0x000055634606f714 in ngx_worker_process_cycle (cycle=cycle@entry=0x556346cab280, data=data@entry=0xf) at src/os/unix/ngx_process_cycle.c:838
#17 0x000055634606dbfb in ngx_spawn_process (cycle=cycle@entry=0x556346cab280, proc=proc@entry=0x55634606f670 <ngx_worker_process_cycle>, data=data@entry=0xf, 
    name=name@entry=0x5563461c9a99 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x000055634606ede0 in ngx_start_worker_processes (cycle=cycle@entry=0x556346cab280, n=20, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:401
#19 0x0000556346070250 in ngx_master_process_cycle (cycle=cycle@entry=0x556346cab280) at src/os/unix/ngx_process_cycle.c:140
#20 0x0000556346046bf5 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:397
(gdb) 
Yogaraj-Alamenda commented 3 years ago

@lizj3624 Could you please try with he latest version of driver and Nginx. Please see if you can reproduce with latest of everything as similar type of crash is fixed in later releases of Nginx and driver.

Driver : 4.11 Nginx : v0.4.3 QAT Engine : v0.6.3 OpenSSL 1.1.1h

lizj3624 commented 3 years ago

I updated to the latest version, nginx still produces similar type of crash when a large number of configurations are loaded,my nginx loads more than 20,000 nginx server blocks, each nginx worker takes up about 1.8g of memory. But nginx works normally when a small amount of configuration is loaded.

  1. Version CentOS: 7.6 Driver:qat1.7.l.4.11.0-00001.tar.gz QAT_Engine: v0.6.3, https://github.com/intel/QAT_Engine asynch_mode_nginx: 1.18, https://github.com/intel/asynch_mode_nginx OpenSSL 1.1.1g

  2. Build steps 1) build driver

        cp -rf $QAT_ROOT/conf/qat-blacklist.conf /etc/modprobe.d
        ./configure --prefix=$QAT_DRIVER
        make insall

    2) build qat_engine

        cd $ICP_ROOT/QAT_Engine
       ./autogen.sh
       ./configure --with-qat_dir=$ICP_ROOT --with-openssl_dir=$OPENSSL_ROOT --with-openssl_install_dir=$OPENSSL_INSTALL
        make && make install

    3) build nginx

    nginx version: nginx/1.18.0
    built by gcc 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
    built with OpenSSL 1.1.1g  21 Apr 2020
    TLS SNI support enabled
    configure arguments: --prefix=/my/servers/asynch-ngx-1.18 --with-http_ssl_module --with-http_stub_status_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-pcre=/root/my-qat/pcre-8.40 --with-pcre-jit --with-debug --with-pcre-opt='-g -Ofast -fPIC -m64 -march=native -fstack-protector-strong -D_FORTIFY_SOURCE=2' --add-dynamic-module=modules/nginx_qat_module --with-cc-opt=' -fPIC -DNGX_SECURE_MEM -I/my/servers/my-`openssl/include -Wno-error=deprecated-declarations' --with-ld-opt='-Wl,-rpath=/my/servers/my-openssl/lib -L/my/servers/my-openssl/lib'
  3. OpenSSL Engine tested ok

    ./bin/openssl engine qatengine -t -vvv
    (qatengine) Reference implementation of QAT crypto engine v0.6.3
     [ available ]
     ENABLE_EXTERNAL_POLLING: Enables the external polling interface to the engine.
          (input flags): NO_INPUT
     POLL: Polls the engine for any completed requests
          (input flags): NO_INPUT
     SET_INSTANCE_FOR_THREAD: Set instance to be used by this thread
          (input flags): NUMERIC
     GET_NUM_OP_RETRIES: Get number of retries
          (input flags): NO_INPUT
     SET_MAX_RETRY_COUNT: Set maximum retry count
          (input flags): NUMERIC
     SET_INTERNAL_POLL_INTERVAL: Set internal polling interval
          (input flags): NUMERIC
     GET_EXTERNAL_POLLING_FD: Returns non blocking fd for crypto engine
          (input flags): NO_INPUT
     ENABLE_EVENT_DRIVEN_POLLING_MODE: Set event driven polling mode
          (input flags): NO_INPUT
     GET_NUM_CRYPTO_INSTANCES: Get the number of crypto instances
          (input flags): NO_INPUT
     DISABLE_EVENT_DRIVEN_POLLING_MODE: Unset event driven polling mode
          (input flags): NO_INPUT
     SET_EPOLL_TIMEOUT: Set epoll_wait timeout
          (input flags): NUMERIC
     SET_CRYPTO_SMALL_PACKET_OFFLOAD_THRESHOLD: Set QAT small packet threshold
          (input flags): STRING
     ENABLE_INLINE_POLLING: Enables the inline polling mode.
          (input flags): NO_INPUT
     ENABLE_HEURISTIC_POLLING: Enable the heuristic polling mode
          (input flags): NO_INPUT
     GET_NUM_REQUESTS_IN_FLIGHT: Get the number of in-flight requests
          (input flags): NUMERIC
     INIT_ENGINE: Initializes the engine if not already initialized
          (input flags): NO_INPUT
     SET_CONFIGURATION_SECTION_NAME: Set the configuration section to use in QAT driver configuration file
          (input flags): STRING
     ENABLE_SW_FALLBACK: Enables the fallback to SW if the acceleration devices go offline
          (input flags): NO_INPUT
     HEARTBEAT_POLL: Check the acceleration devices are still functioning
          (input flags): NO_INPUT
     DISABLE_QAT_OFFLOAD: Perform crypto operations on core
          (input flags): NO_INPUT
  4. nginx.conf

    
    worker_processes    16;
    worker_cpu_affinity 01111111111111111100000000000000;

user root;

error_log logs/error.log error;

pid nginx.pid;

load_module modules/ngx_ssl_engine_qat_module.so;

events { use epoll; multi_accept on; worker_connections 102400; }

ssl_engine { use_engine qatengine; default_algorithms ALL; qat_engine { qat_offload_mode async; qat_notify_mode poll; qat_poll_mode heuristic; qat_sw_fallback on;

qat_heuristic_poll_asym_threshold 48;

    #qat_heuristic_poll_sym_threshold 24;
}

}

Nginx still crash when when a large number of configurations are loaded
```c
gdb sbin/nginx core.26511
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /my/servers/asynch-ngx-1.18/sbin/nginx...done.
[New LWP 26511]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `nginx: worker process                                                         '.
Program terminated with signal 6, Aborted.
#0  0x00007fc1dbfab207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 sssd-client-1.16.2-13.el7.x86_64 systemd-libs-219-78.el7_9.2.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007fc1dbfab207 in raise () from /lib64/libc.so.6
#1  0x00007fc1dbfac8f8 in abort () from /lib64/libc.so.6
#2  0x00007fc1dbfedd27 in __libc_message () from /lib64/libc.so.6
#3  0x00007fc1dc08c9e7 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007fc1dc08ab62 in __chk_fail () from /lib64/libc.so.6
#5  0x00007fc1dc08c947 in __fdelt_warn () from /lib64/libc.so.6
#6  0x00007fc1db497c91 in adf_proxy_poll_event () from /my/servers/qat-src/QAT-1.7/build/libqat_s.so
#7  0x00007fc1db499b31 in icp_adf_poll_device_events () from /my/servers/qat-src/QAT-1.7/build/libqat_s.so
#8  0x00007fc1db495e75 in icp_sal_poll_device_events ()
    at /my/servers/qat-src/QAT-1.7/quickassist/lookaside/access_layer/src/user/sal_user.c:305
#9  0x00007fc1db725005 in poll_heartbeat () at qat_polling.c:386
#10 0x00007fc1db7229a6 in qat_engine_ctrl (e=<optimized out>, cmd=<optimized out>, i=<optimized out>, p=0x7ffd1d03b55c,
    f=<optimized out>) at e_qat.c:758
#11 0x00007fc1dc6af769 in ENGINE_ctrl_cmd () from /my/servers/my-openssl/lib/libcrypto.so.1.1
#12 0x00007fc1db95ba60 in qat_engine_heartbeat_poll (log=0x55f832f9d378)
    at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:555
#13 qat_engine_heartbeat_poll_handler (ev=0x7fc1dbb5e6a0 <qat_engine_heartbeat_poll_event>)
    at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:563
#14 0x000055f8312abd9e in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94
#15 0x000055f8312ab9f5 in ngx_process_events_and_timers (cycle=cycle@entry=0x55f832f9d360) at src/event/ngx_event.c:266
#16 0x000055f8312b51c8 in ngx_worker_process_cycle (cycle=cycle@entry=0x55f832f9d360, data=data@entry=0x2)
    at src/os/unix/ngx_process_cycle.c:769
#17 0x000055f8312b36af in ngx_spawn_process (cycle=cycle@entry=0x55f832f9d360,
    proc=proc@entry=0x55f8312b5170 <ngx_worker_process_cycle>, data=data@entry=0x2,
    name=name@entry=0x55f831383b8b "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x000055f8312b4900 in ngx_start_worker_processes (cycle=cycle@entry=0x55f832f9d360, n=16, type=type@entry=-3)
    at src/os/unix/ngx_process_cycle.c:363
#19 0x000055f8312b5eff in ngx_master_process_cycle (cycle=cycle@entry=0x55f832f9d360) at src/os/unix/ngx_process_cycle.c:133
#20 0x000055f8312890c5 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:389
(gdb)
Yogaraj-Alamenda commented 3 years ago

@lizj3624 In nginx.conf, Could you please set 'multi_accept' to 'off' or remove it since the default is 'off' and see if you are able to reproduce the issue.

lizj3624 commented 3 years ago

The same error still happen when set 'multi_accept' to 'off'

Yogaraj-Alamenda commented 3 years ago

@lizj3624 Thanks, We will check and revert to you on this issue and let you know if any more information is needed. The back trace is pointing to QAT driver function which gets called only when qat_sw_fallback is "on" at Nginx conf.