meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.11k stars 2.46k forks source link

The janus_http_msg_destroy() in janus_http.c should take care of free "response" member #1665

Closed lqp276 closed 4 years ago

lqp276 commented 5 years ago

As per the code handle POST request in janus_http_handler:

        /* Suspend the connection and pass the ball to the core */
        JANUS_LOG(LOG_HUGE, "Forwarding request to the core (%p)\n", ts);
        gateway->incoming_request(&janus_http_transport, ts, ts, FALSE, root, &error);
    }

    /* Wait for a response (but not forever) */
#ifndef USE_PTHREAD_MUTEX
    gint64 wakeup = janus_get_monotonic_time() + 10*G_TIME_SPAN_SECOND;
    janus_mutex_lock(&msg->wait_mutex);
    while(!msg->got_response) {
        int res = janus_condition_wait_until(&msg->wait_cond, &msg->wait_mutex, wakeup);
        if(msg->got_response || !res)  <--------here break out by: !res
            break;
    }
    janus_mutex_unlock(&msg->wait_mutex);
#else

The http transport just pass the reqeust to core and wait for the response. But here may be a corner case, that the while(!msg->got_response) loop may break out by condition wait timeout, and in the meantime before janus_http_request_completed called, the core call janus_http_send_message to send mesasge to this transport session, the json message will be set to the field response of janus_transport_session instance and no one will take care of it thereafter.

The code of janus_http_send_message:

/* We have a response */
janus_transport_session *session = (janus_transport_session *)transport;
janus_mutex_lock(&messages_mutex);
if(g_hash_table_lookup(messages, session) == NULL) {
    janus_mutex_unlock(&messages_mutex);
    JANUS_LOG(LOG_ERR, "Invalid HTTP connection...\n");
    json_decref(message);
    return -1;
}
janus_http_msg *msg = (janus_http_msg *)transport->transport_p;
janus_mutex_unlock(&messages_mutex);
if(!msg->connection) {
    JANUS_LOG(LOG_ERR, "Invalid HTTP connection...\n");
    json_decref(message);
    return -1;
}
janus_mutex_lock(&msg->wait_mutex);
msg->response = message;   <---------here set the response when no one waiting
msg->got_response = TRUE;
janus_condition_signal(&msg->wait_cond);
janus_mutex_unlock(&msg->wait_mutex);

It could be manually produce by reduce the janus_condition_wait_until time and add some delay before janus_http_handler return. when janus_http_msg_destroy is called, the response just sit there and leaked. Since there is no sync between the action and we cant assume how long the core will take to process the request, There is a chance to happen, though rarely, rarely...

lminiero commented 5 years ago

Thanks for spotting this!

No synchronous request should ever take more than 10s to answer: the core definitely doesn't, and plugins should be written so that, if they know something will take longer, they should send an ACK back and reply with an asynchronous message instead. When a request takes longer than 10s, either something's wrong (deadlocks? machine so loaded it takes forever?) or the plugin the API is talking to is written poorly. As such, I agree it is a very rare occurrence, and so a bit of a low priority for me honestly.

That said, I guess that one way to address this might be adding an atomic property to janus_http_msg that defines the state: if we timeout, we set the atomic property accordingly, so that when send_message is called, that's a further check we can make to free resources before actually answering. Just thinking out loud, though, and as I said, not very high priority for me at the moment, since I have many other things to deal with. I'll keep track of this, but I don't know when I'll have time to work on it; pull requests are always welcome, of course!

lqp276 commented 5 years ago

Hi lminiero, I totally agree with your timing analysis, I hesitated for a while before write this issue, while I was reading the http transport code and reasoning for some running case in mind, this issue is more 'logically' than 'practically'. For disscuss only, maybe we need only check NULL for the response property as:

static void janus_http_msg_destroy(void *msg) {
      ....
       if(msg->response != NULL) {
        json_decref(msg->response);
        msg->response = NULL;
      }

    g_free(request);
}

Since if the core send message to the transport session and consumed, it will be set to NULL by http transport. So the only way for msg->response to not be NULL is that when core send the response, and missed the waiting period.

lminiero commented 5 years ago

Oh, no need to close, I think it's a good idea to keep this open as a memo that this needs fixing sooner or later. As to your suggestion, I don't think that would work: msg->response is NULL by default: so checking it in janus_http_send_message will be useless, because that's where we assign it in the first place.

lqp276 commented 5 years ago

Ok, Let's just leave it open. Recently I am integrating janus with our private RTC system to add Web support, thank janus for saving us a lot of work!!!

lminiero commented 5 years ago

That's cool to know, and thanks for the kind words! It might be one reason to come visit us at JanusCon then :smile:

lqp276 commented 5 years ago

Oh, lminiero, thank you for the invitation! I wish I could be there, but, for so many reasons like, the distance(I lived in ShangHai, China), language, family etc... Thanks to github that we can together to make janus better, and I will continue watching janus :laughing:

lminiero commented 4 years ago

It took a while, but I finally found some time to replicate the issue and work on a fix.