libstrophe crashes when internet connection fails

slowCoder72 commented 5 years ago

In order to check the connection status in my application I activated the keepalive option. The connection I generated is secured using the XMPP_CONN_FLAG_MANDATORY_TLS flag. For testing I interrupted the internet connection by shutting down the respective device using ifconfig down. After KA_TIMEOUT seconds the application crashes. The DEBUG out gives: "tls DEBUG error=5 errno=110". What happens is (as I conclude): event.c gets an error and tries to disconnect the connection using conn.c:

void conn_disconnect(xmpp_conn_t * const conn) 
{
    xmpp_debug(conn->ctx, "xmpp", "Closing socket.");
    conn->state = XMPP_STATE_DISCONNECTED;
    if (conn->tls) {
    tls_stop(conn->tls);
    tls_free(conn->tls);
    conn->tls = NULL;
    }
    sock_close(conn->sock);

    /* fire off connection handler */
    conn->conn_handler(conn, XMPP_CONN_DISCONNECT, conn->error,
               conn->stream_error, conn->userdata);
}

This however leads to a problem with the OpenSSL lib because the "tls_stop(conn->tls);" call tries to call SSL_shutdown(tls-ssl) in tls_openssl.c and this is not recommended on a connection which has an error: "Note that SSL_shutdown() must not be called if a previous fatal error has occurred on a connection i.e. if SSL_get_error() has returned SSL_ERROR_SYSCALL or SSL_ERROR_SSL." (see https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html). So what i did to avoid the crash is to add an if around tls_stop to check if a previous error is present:

 {
    xmpp_debug(conn->ctx, "xmpp", "Closing socket.");
    conn->state = XMPP_STATE_DISCONNECTED;
    if (conn->tls) {
        _if (conn->error == 0) {
            tls_stop(conn->tls);
        }_
    tls_free(conn->tls);
    conn->tls = NULL;
    }
    sock_close(conn->sock);

    /* fire off connection handler */
    conn->conn_handler(conn, XMPP_CONN_DISCONNECT, conn->error,
               conn->stream_error, conn->userdata);
}

This works fine for my application and I would like to contribute here if the solution seems reasonable to you guys. So I might try to create a pull request. -- any comments, hints are welcome!

pasis commented 5 years ago

Nice research! Thanks for the post. I've made code formatting in your message, hope you're ok with it :)

Regarding SSL_shutdown(), I would suggest to check if it should be called inside tls_stop(). This is because we have multiple TLS modules and maybe some of them needs to run some "stop" API even after fatal errors. Personally I see solution as to cache state that openssl tls object got fatal error (maybe we already have this in tls->error, need to check) and tls_stop() needs to call SSL_shutdown() only when state is OK.

pasis commented 5 years ago

And yes, you're welcome to create pull request

slowCoder72 commented 5 years ago

Thanx for the nice feedback and formatting :). The hint to do it inside tls_stop() for the sake of compatibility is a sensible one! I will dig into that and maybe tls->lasterror is a reasonable candidate for checking the state.

pasis commented 5 years ago

Yes, looks like tls->lasterror is the right value. Just keep in mind, that documentation says only about error values SSL_ERROR_SYSCALL and SSL_ERROR_SSL. So check it instead of tls->lasterror != 0.

slowCoder72 commented 5 years ago

Considering SSL_ERROR_SYSCAL and SSL_ERROR_SSL deliberately seems really mandatory - thank you again! I did this and initialized ret with 1 so that the procedure is not retried if one of the two errors is present. Here is my take on tls_stop() in tls_openssl.c:


int tls_stop(tls_t *tls)
{
    int retries = 0;
    int error;
    int ret = 1;

    while (1) {
        ++retries;
        if ((tls->lasterror != SSL_ERROR_SYSCALL) && (tls->lasterror != SSL_ERROR_SSL)) {
            ret = SSL_shutdown(tls->ssl);
        }
        error = ret < 0 ? SSL_get_error(tls->ssl, ret) : 0;
        if (ret == 1 || !tls_is_recoverable(error) ||
            retries >= TLS_SHUTDOWN_MAX_RETRIES) {
            break;
        }
        _tls_sock_wait(tls, error);
    }
    if (error == SSL_ERROR_SYSCALL && errno == 0) {
        /*
         * Handle special case when peer closes connection instead of
         * proper shutdown.
         */
        error = 0;
        ret = 1;
    }
    _tls_set_error(tls, error);

    return ret <= 0 ? 0 : 1;
}

It works for my application and I am curious to get any kind of feedback, hints etc. In case the change makes sense - how can I push it?

pasis commented 5 years ago

Personally I would write it in the following way:

     int error;
     int ret;

+    /* Comment here. */
+    if (tls->lasterror == SSL_ERROR_SYSCALL || tls->lasterror == SSL_ERROR_SSL)
+        return 1;
+
     while (1) {
         ++retries;
         ret = SSL_shutdown(tls->ssl);

In this case, main logic with calling SSL_shutdown() in the loop is left untouched and is easier to read. Drawback of my solution is adding additional return point, but I believe there are multiple such places in libstrophe, so this is not prohibited.

Anyway, I would prefer to see a comment, that briefly describes why we omit SSL_shutdown(). Something like "According to SSL_shutdown(3), we must not call the function if a previous fatal error has occurred on a connection.".

Regarding coding style, line shouldn't exceed 80 symbols. For example, line with my if statement is exactly 80 symbols long and this is ok. But in your change it is 89 which is not ok :). In such a situation, you can simply break your condition into 2 lines:

        if ((tls->lasterror != SSL_ERROR_SYSCALL) &&
            (tls->lasterror != SSL_ERROR_SSL)) {
            ret = SSL_shutdown(tls->ssl);
        }

Now, how to create a pull request. Standard workflow is to fork libstrophe, make working copy on your computer (cloning your new fork is one of multiple ways), create new branch, commit the fix, push the brach to your github repository and github will propose you to create a pull request. Well, maybe it sounds a bit complicated for the start, so I suggest you find an article which describes all these steps, because it's hard to explain within a comment.

slowCoder72 commented 5 years ago

again your code is cleaner than my proposal :+1: so we should go for that. the 89 characters -> thanks again - it was not on my radar :( pull request: tried the above in the process except I did not fork - not required in our internal workflows -> seems that eventually I will have to pay a fee for all the lessons...thanks for the patience

So shall I create the pull with the risk of iterations (fine for me) or do you want to do it (also fine)?

pasis commented 5 years ago

I would suggest you to create pull request and go through the process. Just for experience. This patch is simple and strait-forward, so it's a good example for learning the process.

slowCoder72 commented 5 years ago

thank you for the opportunity and the hints! Hope my commit proves valuable :)

pasis commented 5 years ago

No worries. Your commit fixes a bug, so it's valuable indeed :)

pasis commented 5 years ago

Fixed via #137.

strophe / libstrophe

libstrophe crashes when internet connection fails #136