intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
398 stars 127 forks source link

nginx worker crashed in ASYNC_WAIT_CTX_get_fd #143

Open zspirate opened 4 years ago

zspirate commented 4 years ago

Hi: I'm running nginx in openssl 1.1.0e and qat driver of version 1.7, the nginx worker sometimes crashes in function ASYNC_WAIT_CTX_get_fd, not very often, it seems that the async job allocated by openssl has been released while qat engine is starting to wakeup the async job, here is the coredump stack:

0 0x00007efff7328f18 in ASYNC_WAIT_CTX_get_fd () from /export/openssl/libcrypto.so.1.1

1 0x00007efff665e1bf in qat_wake_job (job=, jobStatus=) at qat_events.c:289

2 0x00007efff63d75d7 in adf_user_notify_msgs_poll () from /usr/local/lib/libqat_s.so

3 0x00007efff63d31b8 in adf_pollRing () from /usr/local/lib/libqat_s.so

4 0x00007efff63d355f in icp_adf_pollInstance () from /usr/local/lib/libqat_s.so

5 0x00007efff63cc5b9 in icp_sal_CyPollInstance () from /usr/local/lib/libqat_s.so

6 0x00007efff665e4ce in poll_instances () at qat_polling.c:328

7 0x00007efff665d7d6 in qat_engine_ctrl (e=, cmd=, i=, p=0x7ffc6ac74cac, f=) at e_qat.c:835

8 0x00007efff73da889 in ENGINE_ctrl_cmd () from /export/openssl/libcrypto.so.1.1

9 0x00000000004a5e61 in qat_engine_poll (log=0x64c9d80) at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:542

10 ngx_ssl_engine_qat_heuristic_poll (log=0x64c9d80) at modules/nginx_qat_module/ngx_ssl_engine_qat_module.c:666

11 0x000000000044d1d6 in ngx_http_close_connection (c=c@entry=0x7effb4758700) at src/http/ngx_http_request.c:3782

12 0x000000000045038c in ngx_http_ssl_handshake_handler (c=0x7effb4758700) at src/http/ngx_http_request.c:876

13 0x000000000043d367 in ngx_ssl_handshake_handler (ev=) at src/event/ngx_event_openssl.c:2114

14 0x000000000043a27d in ngx_ssl_empty_handler (ev=) at src/event/ngx_event_openssl.c:162

15 0x000000000043095d in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94

16 0x00000000004305ca in ngx_process_events_and_timers (cycle=cycle@entry=0x2e374f0) at src/event/ngx_event.c:264

17 0x00000000004376e2 in ngx_worker_process_cycle (cycle=0x2e374f0, data=) at src/os/unix/ngx_process_cycle.c:771

18 0x0000000000435da2 in ngx_spawn_process (cycle=cycle@entry=0x2e374f0, proc=proc@entry=0x43764f , data=data@entry=0x1d, name=name@entry=0x4aac95 "worker process", respawn=respawn@entry=-4)

at src/os/unix/ngx_process.c:199

19 0x000000000043692b in ngx_start_worker_processes (cycle=cycle@entry=0x2e374f0, n=32, type=type@entry=-4) at src/os/unix/ngx_process_cycle.c:362

20 0x000000000043840a in ngx_master_process_cycle (cycle=0x2e374f0, cycle@entry=0x729090) at src/os/unix/ngx_process_cycle.c:247

21 0x000000000041271d in main (argc=, argv=) at src/core/nginx.c:397

this situation may happen when the SSL can not be established while the qat is still working on crypto steps, if the async timer was out of time, the nginx call ngx_ssl_shutdown() anyway to stop this session, but the async jobs can not be released,the qat can still wakeup the job when it finished work. how to solve this problem? it seems that the openssl lib does not provide the api to release the async job for user apps.

Yogaraj-Alamenda commented 4 years ago

@zspirate Thanks for the information. We will check and comeback on this. BTW what is the version of QAT Engine you are using ?

paulturx commented 4 years ago

@zspirate In order to help us recreate your problem and to check whether your version of QAT engine is missing any fixes in this area in later releases of QAT engine, could you provide complete version info for QAT Engine, nginx and QAT driver? Thanks.

Yogaraj-Alamenda commented 4 years ago

@zspirate Also Can you please use latest OpenSSL version (1.1.0l or 1.1.1f). Similar issue is fixed in this commit 6038.

zspirate commented 4 years ago

@zspirate Thanks for the information. We will check and comeback on this. BTW what is the version of QAT Engine you are using ?

I use the latest version of qat engine,the openssl is 1.1.1b, the nginx is 1.16(with intel qat async patch), and the driver is 1.7.0

paulturx commented 4 years ago

Hi @zspirate We are currently looking into this. Please could you forward the QAT driver config files you used when this core dump was created together with the nginx.conf file (as an attachment). Many thanks in advance. paulturx

zspirate commented 4 years ago

Hi @zspirate We are currently looking into this. Please could you forward the QAT driver config files you used when this core dump was created together with the nginx.conf file (as an attachment). Many thanks in advance. paulturx here is the conf files

归档.zip

paulturx commented 4 years ago

Hi @zsprirate We would be very interested to see whether you are able to reproduce the problem on your set-up with the nginx.conf parameter 'multi_accept' set to 'off' (or else not specifically set at all since the default is 'off') and get back to us with the results. Thanks in advance, paulturx

zspirate commented 3 years ago

Hi @zsprirate We would be very interested to see whether you are able to reproduce the problem on your set-up with the nginx.conf parameter 'multi_accept' set to 'off' (or else not specifically set at all since the default is 'off') and get back to us with the results. Thanks in advance, paulturx

it works!!!,no coredump anymore, but i still don't understand why this parameter 'multi_accept' affect

ipuustin commented 3 years ago

I would also like to understand this better. Is there a limit to the number of connections accepted? Is this related to how qat_pause_job()/qat_wake_job() behave?

zspirate commented 3 years ago

I would also like to understand this better. Is there a limit to the number of connections accepted? Is this related to how qat_pause_job()/qat_wake_job() behave?

this problem occurs again,confused。