intel / asynch_mode_nginx

Other
210 stars 61 forks source link

SSL async timeout results in infinite loop in SSL shutdown #11

Closed pracj3am closed 5 years ago

pracj3am commented 5 years ago

If SSL async time out happens (i.e. c->async->timedout is set) during SSL handshake, then when handling async event in ngx_ssl_handshake_async_handler the c->ssl->handler is called, which is ngx_http_ssl_handshake_handler, which calls ngx_http_close_connection, which calls ngx_ssl_shutdown. This seems like correct behavior.

But in the ngx_ssl_shutdown function it goes through this branch of code

    if (SSL_in_init(c->ssl->connection)) {
        if(c->asynch) {
            /* Check if there is inflight request.
             * Wait till async job becomes finished.
             */
            if (SSL_want_async(c->ssl->connection)) {
                ngx_ssl_async_process_fds(c);
                if(!c->async->timer_set)
                    ngx_add_timer(c->async, NGX_ASYNC_EVENT_TIMEOUT);
                return NGX_AGAIN;
            }

SSL_want_async return 1, since last SSL_do_handshake & SSL_get_error returned WANT ASYNC, ngx_ssl_async_process_fds does nothing and timer is set. After 10 seconds (NGX_ASYNC_EVENT_TIMEOUT) the same happen again. And again and again…

In nginx error log there are the following lines every 10 seconds

2018/12/18 17:07:30 [debug] 14094#0: timer delta: 148
2018/12/18 17:07:30 [debug] 14094#0: *22834 event timer del: 75: 3755265
2018/12/18 17:07:30 [debug] 14094#0: *22834 SSL handshake async handler
2018/12/18 17:07:30 [debug] 14094#0: *22834 close http connection: 75
2018/12/18 17:07:30 [debug] 14094#0: *22834 ngx_ssl_async_process_fds called
2018/12/18 17:07:30 [debug] 14094#0: *22834 event timer add: 75: 10000:3765265
2018/12/18 17:07:30 [debug] 14094#0: worker cycle
2018/12/18 17:07:30 [debug] 14094#0: epoll timer: 92

I think that in above snippet you should change the third if to

if (!c->async->timedout && SSL_want_async(c->ssl->connection)) {
daweiq commented 5 years ago

Thank you for sharing! It seems to me in stress mode, when one connect got async-timeout event, it will never be shutdown and connection leak happened. I want to reproduce this in my setup and verify the fix. How do you trigger the async time out event?

pracj3am commented 5 years ago

I was able to trigger an async time out thanks to #12 with big latency/very low bandwidth connection.

daweiq commented 5 years ago

Thank you for the information, the error introduced by wrong async status of the connection in poor network condition and changed NGX_ASYNC_EVENT_TIMEOUT to 1ms. Need to figure out why SSL_want_async(c->ssl->connection) always return true in that condition.

pracj3am commented 5 years ago

SSL_want_async(c->ssl->connection) returns true, because the last SSL_get_error returned WANT ASYNC, and since then SSL_do_handshake has not been called. You are in the loop ngx_ssl_handshake_async_handler -> ngx_http_close_connection -> ngx_ssl_shutdown, which sets timer and returns NGX_AGAIN....

daweiq commented 5 years ago

Yes, there is a loop between ngx_http_close_connection and ngx_ssl_shutdown to finish all the in-flight async requests before connect closed. When setting NGX_ASYNC_EVENT_TIMEOUT == 1s, the async timeout happened before the async requests returns. While, in the ngx_ssl_handshake_async_handler, it always handle timeout event before async request event. OTAH, the sync request event returned only once, so if missed, the connection handler goes to the loop and the connection never closed.

ngx_ssl_handshake_async_handler(ngx_event_t *aev)
{
     ...
    if (aev->timedout) {
        c->ssl->handler(c);
        return;
    }

   ...

    if (ngx_ssl_handshake(c) == NGX_AGAIN) {
        return;
    }
    ...
}

To fix this, we add the async event flag check, once async event return process the async event 1st. Here's the fix:

ngx_ssl_handshake_async_handler(ngx_event_t *aev)
{
    ...

    if ( !aev->ready && aev->timedout) {
        c->ssl->handler(c);
        return;
    }
    ...
}

Please let me know if you find any issue with it. Thank you for reporting this!

pracj3am commented 5 years ago

This fixes problem outlined in #12 (but it seems to me more like workaround than proper fix, c->async->timedout should not be set, if the async request returned before time out.)

But I cannot see how it fixed if a real async time out happens, i.e. when you wait for event on async file descriptor for more than NGX_ASYNC_EVENT_TIMEOUT seconds, connection is going to be closed and you get to the same loop as reported.

daweiq commented 5 years ago

No, it does fix the issue. When real async time out happens, the loop between ngx_http_close_connection and ngx_ssl_shutdown are designed to waiting for the async event (return of async request). And the async event are still polled by Nginx epoll_wait call. So once async request returns, the SSL_want_async(c->ssl->connection) call in ngx_ssl_shutdown will return false and the loop will be broke.

This issue happened, because the only async event happened at the same time as async timeout event. The async event triggered ngx_ssl_handshake_async_handler call goes into the timeout loop instead of async job status update, so the async event is missing.

pracj3am commented 5 years ago

Can I have few questions? :)

  1. Why do you want to wait indefinitely for an async event in ngx_ssl_shutdown? SSL_shutdown won't be called anyway in this case, so you can continue with closing the connection. And what if async request will never return, even if it should?
  2. How is it possible that async event happens at the same time as async timeout event? There is just one process with one thread, so the async event or the async timeout event will be handled first and than the other.

There is still the problem described in #12 caused by not deleting nginx timers for async socket. Because if nginx waits for event on tcp socket for more than NGX_ASYNC_EVENT_TIMEOUT seconds, async socket is marked as timedout, which causes, that the flag timedout is meaningless.

daweiq commented 5 years ago

For 1st question, Nginx using fds to wake and pause async jobs in OpenSSL. The ngx_ssl_shutdown call will delete the connection as well as fds associated to the connection.
If we delete those fds before the async event returns, the fds will be invalid or reassign to other new connections.
Later, when the async event returns which will trigger a callback and try to write the fd which has been deleted by timeout event, it will access an invalid fd or writing to a fd belong to a new connection. So, it will introduce more errors.

pracj3am commented 5 years ago

I see, thanks. But waiting for the async timeout should be imo limited by a time out, otherwise you can get socket leaking easily in case of bugs in OpenSSL or QAT engine and driver. What is the point of introducing of NGX_ASYNC_EVENT_TIMEOUT, if you wait for async event in shutdown anyway?

If async request does not return in 10 seconds, it smells like a bug. Closing event fd and freeing related ctx structures is responsibility of QAT SSL engine, so if you call SSL_free in nginx code, and later async events are causing segfaults, then it is imho responsibility of the engine to handle it (check if waitctx was already freed) and I am prepare to file a bug, if it isn't so :-)

daweiq commented 5 years ago

Yes, it makes sense. Besides the async event flag check in async handler, we should close connection by closing all event fd and related ctx with SSL_free.