Closed pracj3am closed 5 years ago
Thank you for sharing! It seems to me in stress mode, when one connect got async-timeout event, it will never be shutdown and connection leak happened. I want to reproduce this in my setup and verify the fix. How do you trigger the async time out event?
I was able to trigger an async time out thanks to #12 with big latency/very low bandwidth connection.
Thank you for the information, the error introduced by wrong async status of the connection in poor network condition and changed NGX_ASYNC_EVENT_TIMEOUT to 1ms. Need to figure out why SSL_want_async(c->ssl->connection) always return true in that condition.
SSL_want_async(c->ssl->connection)
returns true, because the last SSL_get_error
returned WANT ASYNC, and since then SSL_do_handshake
has not been called. You are in the loop ngx_ssl_handshake_async_handler
-> ngx_http_close_connection
-> ngx_ssl_shutdown
, which sets timer and returns NGX_AGAIN....
Yes, there is a loop between ngx_http_close_connection and ngx_ssl_shutdown to finish all the in-flight async requests before connect closed. When setting NGX_ASYNC_EVENT_TIMEOUT == 1s, the async timeout happened before the async requests returns. While, in the ngx_ssl_handshake_async_handler, it always handle timeout event before async request event. OTAH, the sync request event returned only once, so if missed, the connection handler goes to the loop and the connection never closed.
ngx_ssl_handshake_async_handler(ngx_event_t *aev)
{
...
if (aev->timedout) {
c->ssl->handler(c);
return;
}
...
if (ngx_ssl_handshake(c) == NGX_AGAIN) {
return;
}
...
}
To fix this, we add the async event flag check, once async event return process the async event 1st. Here's the fix:
ngx_ssl_handshake_async_handler(ngx_event_t *aev)
{
...
if ( !aev->ready && aev->timedout) {
c->ssl->handler(c);
return;
}
...
}
Please let me know if you find any issue with it. Thank you for reporting this!
This fixes problem outlined in #12 (but it seems to me more like workaround than proper fix, c->async->timedout
should not be set, if the async request returned before time out.)
But I cannot see how it fixed if a real async time out happens, i.e. when you wait for event on async file descriptor for more than NGX_ASYNC_EVENT_TIMEOUT
seconds, connection is going to be closed and you get to the same loop as reported.
No, it does fix the issue.
When real async time out happens, the loop between ngx_http_close_connection
and ngx_ssl_shutdown
are designed to waiting for the async event (return of async request).
And the async event are still polled by Nginx epoll_wait
call.
So once async request returns, the SSL_want_async(c->ssl->connection)
call in ngx_ssl_shutdown
will return false and the loop will be broke.
This issue happened, because the only async event happened at the same time as async timeout event.
The async event triggered ngx_ssl_handshake_async_handler
call goes into the timeout loop instead of async job status update, so the async event is missing.
Can I have few questions? :)
ngx_ssl_shutdown
? SSL_shutdown
won't be called anyway in this case, so you can continue with closing the connection. And what if async request will never return, even if it should?There is still the problem described in #12 caused by not deleting nginx timers for async socket. Because if nginx waits for event on tcp socket for more than NGX_ASYNC_EVENT_TIMEOUT
seconds, async socket is marked as timedout, which causes, that the flag timedout
is meaningless.
For 1st question, Nginx using fds to wake and pause async jobs in OpenSSL.
The ngx_ssl_shutdown call will delete the connection as well as fds associated to the connection.
If we delete those fds before the async event returns, the fds will be invalid or reassign to other new connections.
Later, when the async event returns which will trigger a callback and try to write the fd which has been deleted by timeout event, it will access an invalid fd or writing to a fd belong to a new connection.
So, it will introduce more errors.
I see, thanks. But waiting for the async timeout should be imo limited by a time out, otherwise you can get socket leaking easily in case of bugs in OpenSSL or QAT engine and driver. What is the point of introducing of NGX_ASYNC_EVENT_TIMEOUT
, if you wait for async event in shutdown anyway?
If async request does not return in 10 seconds, it smells like a bug. Closing event fd and freeing related ctx structures is responsibility of QAT SSL engine, so if you call SSL_free
in nginx code, and later async events are causing segfaults, then it is imho responsibility of the engine to handle it (check if waitctx was already freed) and I am prepare to file a bug, if it isn't so :-)
Yes, it makes sense. Besides the async event flag check in async handler, we should close connection by closing all event fd and related ctx with SSL_free.
If SSL async time out happens (i.e.
c->async->timedout
is set) during SSL handshake, then when handling async event inngx_ssl_handshake_async_handler
thec->ssl->handler
is called, which isngx_http_ssl_handshake_handler
, which callsngx_http_close_connection
, which callsngx_ssl_shutdown
. This seems like correct behavior.But in the
ngx_ssl_shutdown
function it goes through this branch of codeSSL_want_async
return 1, since lastSSL_do_handshake
&SSL_get_error
returned WANT ASYNC,ngx_ssl_async_process_fds
does nothing and timer is set. After 10 seconds (NGX_ASYNC_EVENT_TIMEOUT
) the same happen again. And again and again…In nginx error log there are the following lines every 10 seconds
I think that in above snippet you should change the third if to