sergey-dryabzhinsky / nginx-rtmp-module

NGINX-based Media Streaming Server
http://nginx-rtmp.blogspot.com
BSD 2-Clause "Simplified" License
1.02k stars 216 forks source link

Infinite loop during session close #145

Open heftig opened 8 years ago

heftig commented 8 years ago

The worker occasionally gets stuck closing a session, spinning forever. GDB obtained this backtrace:

#0  0x00000000004c8717 in ngx_rtmp_free_shared_chain (cscf=cscf@entry=0x2640bd8, in=0x329be24) at /tmp/buildd/nginx-1.10.0/debian/modules/nginx-rtmp/ngx_rtmp_shared.c:69
        cl = 0x2fd2d54
#1  0x00000000004c2cb3 in ngx_rtmp_close_session_handler (e=<optimized out>) at /tmp/buildd/nginx-1.10.0/debian/modules/nginx-rtmp/ngx_rtmp_init.c:300
        s = 0x3721430
        c = 0x7f5cf5633af0
        cscf = 0x2640bd8
#2  0x0000000000435043 in ngx_event_process_posted (cycle=cycle@entry=0x263cff0, posted=0xe87c30 <ngx_posted_events>) at src/event/ngx_event_posted.c:33
        q = 0x3721488
        ev = 0x3721438
#3  0x0000000000434b4b in ngx_process_events_and_timers (cycle=cycle@entry=0x263cff0) at src/event/ngx_event.c:259
        flags = <optimized out>
        timer = <optimized out>
        delta = <optimized out>
#4  0x000000000043c475 in ngx_worker_process_cycle (cycle=cycle@entry=0x263cff0, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:753
        worker = 0
#5  0x000000000043af30 in ngx_spawn_process (cycle=cycle@entry=0x263cff0, proc=proc@entry=0x43c430 <ngx_worker_process_cycle>, data=data@entry=0x0, name=name@entry=0x4fc122 "worker process", respawn=respawn@entry=-3)
    at src/os/unix/ngx_process.c:198
        on = 1
        pid = 0
        s = 0
#6  0x000000000043c6e4 in ngx_start_worker_processes (cycle=cycle@entry=0x263cff0, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:358
        i = 0
        ch = {command = 1, pid = 0, slot = 0, fd = 0}
#7  0x000000000043d41f in ngx_master_process_cycle (cycle=cycle@entry=0x263cff0) at src/os/unix/ngx_process_cycle.c:130
        title = 0x269f55c "master process /usr/sbin/nginx -g daemon off;"
        p = <optimized out>
        size = <optimized out>
        i = <optimized out>
        n = <optimized out>
        sigio = <optimized out>
        set = {__val = {0 <repeats 16 times>}}
        itv = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}
        live = <optimized out>
        delay = <optimized out>
        ls = <optimized out>
        ccf = 0x263ed98
#8  0x00000000004161ef in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367
        b = <optimized out>
        log = 0x760bc0 <ngx_log>
        i = <optimized out>
        cycle = 0x263cff0
        init_cycle = {conf_ctx = 0x0, pool = 0x263c980, log = 0x760bc0 <ngx_log>, new_log = {log_level = 0, file = 0x0, connection = 0, disk_full_time = 0, handler = 0x0, data = 0x0, writer = 0x0, wdata = 0x0, action = 0x0, next = 0x0}, 
          log_use_stderr = 0, files = 0x0, free_connections = 0x0, free_connection_n = 0, modules = 0x0, modules_n = 0, modules_used = 0, reusable_connections_queue = {prev = 0x0, next = 0x0}, listening = {elts = 0x0, nelts = 0, 
            size = 0, nalloc = 0, pool = 0x0}, paths = {elts = 0x0, nelts = 0, size = 0, nalloc = 0, pool = 0x0}, config_dump = {elts = 0x0, nelts = 0, size = 0, nalloc = 0, pool = 0x0}, open_files = {last = 0x0, part = {elts = 0x0, 
              nelts = 0, next = 0x0}, size = 0, nalloc = 0, pool = 0x0}, shared_memory = {last = 0x0, part = {elts = 0x0, nelts = 0, next = 0x0}, size = 0, nalloc = 0, pool = 0x0}, connection_n = 0, files_n = 0, connections = 0x0, 
          read_events = 0x0, write_events = 0x0, old_cycle = 0x0, conf_file = {len = 21, data = 0x4f6a7f "/etc/nginx/nginx.conf"}, conf_param = {len = 11, data = 0x7ffef77acef4 "ss"}, conf_prefix = {len = 11, 
            data = 0x4f6a7f "/etc/nginx/nginx.conf"}, prefix = {len = 17, data = 0x4f6a6d "/usr/share/nginx/"}, lock_file = {len = 0, data = 0x0}, hostname = {len = 0, data = 0x0}}
        cd = <optimized out>
        ccf = <optimized out>

rtmp config:

rtmp {
    server {
        listen 1935 so_keepalive=30s:10s:5;
        ping 6s;
        ping_timeout 4s;
        timeout 15;
        drop_idle_publisher 8s;
        max_message 2M;
        chunk_size 8192;
        meta on;
        wait_key on;
        wait_video on;
        interleave on;
        idle_streams off;
        play_restart on;

        # Avoid buffering; at least out_cork messages get buffered if buflen >0
        buflen 0;
        out_cork 0;

        on_connect http://127.0.0.1:8080/on_connect;

        application foobar {
            live on;

            on_publish http://127.0.0.1:8080/on_publish;
            on_publish_done http://127.0.0.1:8080/on_publish_done;

            notify_update_timeout 5;
            notify_update_strict on;
            on_update http://127.0.0.1:8080/on_update;

            hls on;
            hls_path /run/nginx/hls;
            hls_fragment 2s;
            hls_playlist_length 40s;
            hls_nested on;
            hls_cleanup on;
            hls_fragment_naming system;
            hls_fragment_naming_granularity 1000;

            dash on;
            dash_path /run/nginx/dash;
            dash_fragment 2s;
            dash_playlist_length 40s;
            dash_nested on;
            dash_cleanup on;
        }
    }
}

nginx 1.10.0, nginx-rtmp-module c3237ae747. I still have a core dump of the worker.

heftig commented 8 years ago

Seems the ngx_chain_t at 0x329be24 (s->out[221]) is a circular list with 617 elements.

heftig commented 8 years ago

I've tried to hack around this; pretty much still untested.

diff --git a/ngx_rtmp_shared.c b/ngx_rtmp_shared.c
index 6f6e4e8de7587d54..3bdc5beeef6edef4 100644
--- a/ngx_rtmp_shared.c
+++ b/ngx_rtmp_shared.c
@@ -66,7 +66,8 @@ ngx_rtmp_free_shared_chain(ngx_rtmp_core_srv_conf_t *cscf, ngx_chain_t *in)
     }

     for (cl = in; ; cl = cl->next) {
-        if (cl->next == NULL) {
+        /* FIXME: Don't create circular chains in the first place */
+        if (cl->next == NULL || cl->next == in) {
             cl->next = cscf->free;
             cscf->free = in;
             return;
sullrich commented 8 years ago

Ran into this same issue and your patch resolved it. Thanks!

sergey-dryabzhinsky commented 8 years ago

Thanks @heftig Merged into dev

Stvad commented 8 years ago

I can reliably produce the loop -_-. I do that by trying to access the stream that includes onTextData packets from the remote host (that's an important detail, because when I do that with a local client it's not happening). The client will crash (because nginx will start sending garbage to it) and nginx will freeze in this cleaning loop. Though in my case, it seems the loop does not include the beginning of the list. Also hacking around and breaking the loop won't solve the problem for me (as Nginx will still be sending the garbage data to the client).

Stvad commented 8 years ago

Works for me if I use metadata support as implemented here: https://github.com/arut/nginx-rtmp-module/pull/708 and not a version merged in this fork.

Stvad commented 8 years ago

Hey @sergey-dryabzhinsky, I wanted to ask you if you would be open to the idea of accepting the metadata implementation change (replacing the current one that leads to this bug with the one referenced above)? (I plan to do that for my own purposes, as I was not able to locate an error in current implementation that leads to described behaviour)