ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
24.72k stars 5.28k forks source link

Edge: Crashed in remote mode + low_latency settings #1761

Open vladimir131313 opened 4 years ago

vladimir131313 commented 4 years ago

Description'

Please ensure that the markdown structure is maintained. I am using srs as Edge cluster + low_latency settings from here (https://github.com/ossrs/srs/wiki/v3_EN_LowLatency). I noticed that some players make hundreds short http connections to srs and it is crashed. I probed this issue and found that I can repeat it from my terminal. I started short CURL request (with one second timeout) to Flv streaming in cycle and srs is crashed. Usually it is crashed after ten requests.

For example:

for ((i=1;i<50;i++)); do echo -n $i; curl -v -s -o /dev/null -m 1 http://127.0.0.1:20180/test/test.flv 2>&1 | grep -E '(Failed)|(HTTP)' | grep -v GET; done
1< HTTP/1.1 200 OK
2< HTTP/1.1 200 OK
3< HTTP/1.1 200 OK
4< HTTP/1.1 200 OK
5< HTTP/1.1 200 OK
6< HTTP/1.1 200 OK
78* Failed to connect to 127.0.0.1 port 20180: Connection refused
9* Failed to connect to 127.0.0.1 port 20180: Connection refused
10* Failed to connect to 127.0.0.1 port 20180: Connection refused
...

*** abrt[13425]: Saved core dump of pid 13240 (srs/trunk/objs/srs) to /var/spool/abrt/ccpp-2020-05-13-16:44:11-13240 (2400256 bytes)

After some tests I found that problem in "mw_latency" directive. I used it for low latency streaming.

    play {
        gop_cache       off;
        queue_length    10;
        mw_latency      100;
    }

When I commented it my test has been successful.

    play {
        gop_cache       off;
        queue_length    10;
       # mw_latency      100;
    }
for ((i=1;i<50;i++)); do echo -n $i; curl -v -s -o /dev/null -m 1 http://127.0.0.1:20180/test/test.flv 2>&1 | grep -E '(Failed)|(HTTP)' | grep -v GET; done
1< HTTP/1.1 200 OK
2< HTTP/1.1 200 OK
3< HTTP/1.1 200 OK
4< HTTP/1.1 200 OK
...
48< HTTP/1.1 200 OK
49< HTTP/1.1 200 OK
  1. SRS version: 3.0.140
  2. The log of SRS is as follows: Please ensure that the markdown structure is maintained.

[2020-05-13 17:01:48.531][Trace][24091][522] HTTP client ip=127.0.0.1, request=0, to=15000ms [2020-05-13 17:01:48.531][Trace][24091][522] HTTP GET http://127.0.0.1:20180/test/test.flv, content-length=-1 [2020-05-13 17:01:48.531][Trace][24091][522] http: mount flv stream for sid=/test/test, mount=/test/test.flv [2020-05-13 17:01:48.531][Trace][24091][522] flv: source url=/test/test, is_edge=1, source_id=-1[-1] [2020-05-13 17:01:48.531][Trace][24091][522] create consumer, active=0, queue_size=0.00, jitter=10000000 [2020-05-13 17:01:48.531][Trace][24091][522] ignore disabled exec for vhost=defaultVhost [2020-05-13 17:01:48.532][Trace][24091][522] set fd=11 TCP_NODELAY 0=>1 [2020-05-13 17:01:48.532][Trace][24091][522] set fd=11, SO_SNDBUF=660150=>50000, buffer=100ms [2020-05-13 17:01:48.532][Trace][24091][522] FLV /test/test.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128 [2020-05-13 17:01:48.532][Trace][24091][522] update source_id=522[522] [2020-05-13 17:01:48.540][Trace][24091][524] complex handshake success. [2020-05-13 17:01:48.540][Trace][24091][524] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128 [2020-05-13 17:01:48.620][Trace][24091][524] connected, version=3.0.140.0, ip=127.0.0.1, pid=11749, id=1181, dsu=1 [2020-05-13 17:01:48.620][Trace][24091][524] edge change from 100 to state 101 (pull). [2020-05-13 17:01:48.621][Trace][24091][524] got metadata, width=424, height=240, vcodec=7, acodec=10 [2020-05-13 17:01:48.621][Trace][24091][524] 4B audio sh, codec(10, profile=LC, 2channels, 0kbps, 48000HZ), flv(16bits, 2channels, 44100HZ) [2020-05-13 17:01:48.621][Trace][24091][524] 43B video sh, codec(7, profile=Baseline, level=3, 432x240, 0kbps, 0.0fps, 0.0s) [2020-05-13 17:01:48.631][Trace][24091][522] update source_id=524[524] [2020-05-13 17:01:48.631][Trace][24091][522] FLV: write header audio=1, video=1 [2020-05-13 17:01:49.533][Warn][24091][524][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read thread [24091][524]: ingest() [src/app/srs_app_edge.cpp:333][errno=4] thread [24091][524]: recv_message() [src/protocol/srs_rtmp_stack.cpp:389][errno=4] thread [24091][524]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:871][errno=4] thread [24091][524]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:966][errno=4] thread [24091][524]: grow() [src/protocol/srs_protocol_stream.cpp:179][errno=4] thread [24091][524]: read() [src/service/srs_service_st.cpp:490][errno=4] [2020-05-13 17:01:49.539][Trace][24091][525] HTTP client ip=127.0.0.1, request=0, to=15000ms [2020-05-13 17:01:49.539][Trace][24091][525] HTTP GET http://127.0.0.1:20180/test/test.flv, content-length=-1 [2020-05-13 17:01:49.539][Trace][24091][525] dispatch cached gop success. count=49, duration=716 [2020-05-13 17:01:49.539][Trace][24091][525] create consumer, active=1, queue_size=0.00, jitter=10000000 [2020-05-13 17:01:49.539][Trace][24091][525] set fd=13 TCP_NODELAY 0=>1 [2020-05-13 17:01:49.539][Trace][24091][525] set fd=13, SO_SNDBUF=660150=>50000, buffer=100ms [2020-05-13 17:01:49.540][Trace][24091][525] FLV /test/test.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128 [2020-05-13 17:01:49.540][Trace][24091][525] FLV: write header audio=1, video=1 [2020-05-13 17:01:50.541][Trace][24091][525] cleanup when unpublish [2020-05-13 17:01:50.541][Trace][24091][525] edge change from 101 to state 0 (init). [2020-05-13 17:01:50.541][Warn][24091][525][4] client disconnect peer. ret=1007 [2020-05-13 17:01:50.548][Trace][24091][526] HTTP client ip=127.0.0.1, request=0, to=15000ms [2020-05-13 17:01:50.548][Trace][24091][526] HTTP GET http://127.0.0.1:20180/test/test.flv, content-length=-1 [2020-05-13 17:01:50.549][Trace][24091][526] flv: source url=/test/test, is_edge=1, source_id=-1[-1] [2020-05-13 17:01:50.549][Trace][24091][526] create consumer, active=0, queue_size=0.00, jitter=10000000 [2020-05-13 17:01:50.549][Trace][24091][526] ignore disabled exec for vhost=defaultVhost [2020-05-13 17:01:50.549][Trace][24091][526] set fd=12 TCP_NODELAY 0=>1 [2020-05-13 17:01:50.549][Trace][24091][526] set fd=12, SO_SNDBUF=660150=>50000, buffer=100ms [2020-05-13 17:01:50.549][Trace][24091][526] FLV /test/test.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128 [2020-05-13 17:01:50.549][Trace][24091][526] update source_id=526[526] [2020-05-13 17:01:50.557][Trace][24091][527] complex handshake success. [2020-05-13 17:01:50.557][Trace][24091][527] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128 [2020-05-13 17:01:50.638][Trace][24091][527] connected, version=3.0.140.0, ip=127.0.0.1, pid=11749, id=1182, dsu=1 [2020-05-13 17:01:50.638][Trace][24091][527] edge change from 100 to state 101 (pull). [2020-05-13 17:01:50.638][Trace][24091][527] got metadata, width=424, height=240, vcodec=7, acodec=10 [2020-05-13 17:01:50.638][Trace][24091][527] 4B audio sh, codec(10, profile=LC, 2channels, 0kbps, 48000HZ), flv(16bits, 2channels, 44100HZ) [2020-05-13 17:01:50.638][Trace][24091][527] 43B video sh, codec(7, profile=Baseline, level=3, 432x240, 0kbps, 0.0fps, 0.0s) [2020-05-13 17:01:50.649][Trace][24091][526] update source_id=527[527] [2020-05-13 17:01:50.649][Trace][24091][526] FLV: write header audio=1, video=1 [2020-05-13 17:01:51.550][Warn][24091][527][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read thread [24091][527]: ingest() [src/app/srs_app_edge.cpp:333][errno=4] thread [24091][527]: recv_message() [src/protocol/srs_rtmp_stack.cpp:389][errno=4] thread [24091][527]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:871][errno=4] thread [24091][527]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:966][errno=4] thread [24091][527]: grow() [src/protocol/srs_protocol_stream.cpp:179][errno=4] thread [24091][527]: read() [src/service/srs_service_st.cpp:490][errno=4] [2020-05-13 17:01:51.556][Trace][24091][528] HTTP client ip=127.0.0.1, request=0, to=15000ms [2020-05-13 17:01:51.557][Trace][24091][528] HTTP GET http://127.0.0.1:20180/test/test.flv, content-length=-1 [2020-05-13 17:01:51.557][Trace][24091][528] dispatch cached gop success. count=62, duration=906 [2020-05-13 17:01:51.557][Trace][24091][528] create consumer, active=1, queue_size=0.00, jitter=10000000 [2020-05-13 17:01:51.557][Trace][24091][528] set fd=14 TCP_NODELAY 0=>1 [2020-05-13 17:01:51.557][Trace][24091][528] set fd=14, SO_SNDBUF=660150=>50000, buffer=100ms [2020-05-13 17:01:51.557][Trace][24091][528] FLV /test/test.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128 [2020-05-13 17:01:51.557][Trace][24091][528] FLV: write header audio=1, video=1 [2020-05-13 17:01:52.533][Trace][24091][522] cleanup when unpublish [2020-05-13 17:01:52.533][Trace][24091][522] edge change from 101 to state 0 (init). [2020-05-13 17:01:52.533][Warn][24091][522][4] client disconnect peer. ret=1007 [2020-05-13 17:01:52.558][Warn][24091][528][104] server disconnect. ret=4040 [2020-05-13 17:01:52.565][Trace][24091][529] HTTP client ip=127.0.0.1, request=0, to=15000ms [2020-05-13 17:01:52.565][Trace][24091][529] HTTP GET http://127.0.0.1:20180/test/test.flv, content-length=-1 [2020-05-13 17:01:52.565][Trace][24091][529] flv: source url=/test/test, is_edge=1, source_id=-1[-1] [2020-05-13 17:01:52.565][Trace][24091][529] create consumer, active=0, queue_size=0.00, jitter=10000000 [2020-05-13 17:01:52.565][Trace][24091][529] ignore disabled exec for vhost=defaultVhost [2020-05-13 17:01:52.565][Trace][24091][529] set fd=11 TCP_NODELAY 0=>1 [2020-05-13 17:01:52.565][Trace][24091][529] set fd=11, SO_SNDBUF=660150=>50000, buffer=100ms [2020-05-13 17:01:52.565][Trace][24091][529] FLV /test/test.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128 [2020-05-13 17:01:52.565][Trace][24091][529] update source_id=529[529] [2020-05-13 17:01:52.574][Trace][24091][530] complex handshake success. [2020-05-13 17:01:52.574][Trace][24091][530] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128 [2020-05-13 17:01:52.655][Trace][24091][530] connected, version=3.0.140.0, ip=127.0.0.1, pid=11749, id=1183, dsu=1 [2020-05-13 17:01:52.655][Trace][24091][530] edge change from 100 to state 101 (pull). [2020-05-13 17:01:52.655][Trace][24091][530] got metadata, width=424, height=240, vcodec=7, acodec=10 [2020-05-13 17:01:52.655][Trace][24091][530] 4B audio sh, codec(10, profile=LC, 2channels, 0kbps, 48000HZ), flv(16bits, 2channels, 44100HZ) [2020-05-13 17:01:52.655][Trace][24091][530] 43B video sh, codec(7, profile=Baseline, level=3, 432x240, 0kbps, 0.0fps, 0.0s) [2020-05-13 17:01:52.665][Trace][24091][529] update source_id=530[530] [2020-05-13 17:01:52.665][Trace][24091][529] FLV: write header audio=1, video=1 [2020-05-13 17:01:53.574][Trace][24091][531] HTTP client ip=127.0.0.1, request=0, to=15000ms [2020-05-13 17:01:53.574][Trace][24091][531] HTTP GET http://127.0.0.1:20180/test/test.flv, content-length=-1 [2020-05-13 17:01:53.574][Trace][24091][531] dispatch cached gop success. count=24, duration=369 [2020-05-13 17:01:53.574][Trace][24091][531] create consumer, active=1, queue_size=0.00, jitter=10000000 [2020-05-13 17:01:53.574][Trace][24091][531] set fd=14 TCP_NODELAY 0=>1 [2020-05-13 17:01:53.574][Trace][24091][531] set fd=14, SO_SNDBUF=660150=>50000, buffer=100ms [2020-05-13 17:01:53.574][Trace][24091][531] FLV /test/test.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128 [2020-05-13 17:01:53.574][Trace][24091][531] FLV: write header audio=1, video=1 [2020-05-13 17:01:53.666][Warn][24091][529][11] client disconnect peer. ret=1007

1. The configuration of SRS is as follows:
Please ensure that the markdown structure is maintained.

cat conf/srs.conf listen 20135; max_connections 10000; srs_log_tank file; srs_log_file objs/logs/srs.log; http_api { enabled on; listen 20185; } http_server { enabled on; listen 20180; dir objs/nginx/html; } stats { network 0; }

vhost defaultVhost { tcp_nodelay on min_latency on;

play {
    gop_cache       off;
    queue_length    10;
    mw_latency      100;
}

publish {
    mr off;
}
cluster {
    mode remote;
    origin 127.0.0.1:40135;
}
http_remux {
    enabled     on;
    mount       [vhost]/[app]/[stream].flv;
    hstrs       on;
}

}

cat conf/srs-pub.conf

main config for srs.

@see full.conf for detail config.

listen 40135; pid objs/srs-pub.pid; max_connections 10000; srs_log_tank file; srs_log_file objs/logs/srs-pub.log; http_api { enabled on; listen 40185; } stats { network 0; }

vhost defaultVhost { tcp_nodelay on min_latency on;

play {
    gop_cache       off;
    queue_length    10;
    mw_latency      100;
}

publish {
    mr off;
}

}



**Replay**
Please ensure that the markdown structure is maintained.

1. `./objs/srs -c conf/srs-pub.conf  >/dev/null 2>&1`
1. `./objs/srs -c conf/srs.conf  >/dev/null 2>&1`
1. ` ffmpeg -i INPUT -vcodec copy -acodec copy -f flv rtmp://127.0.0.1:40135/test/test`
1. `for ((i=1;i<50;i++)); do echo -n $i; curl -v -s -o /dev/null -m 1 http://127.0.0.1:20180/test/test.flv 2>&1 | grep -E '(Failed)|(HTTP)' | grep -v GET; done`

**Expect**
Please ensure that the markdown structure is maintained.
Is it possible fix it?
Thanks.

`TRANS_BY_GPT3`
winlinvip commented 3 years ago

Check it.

winlinvip commented 2 months ago

When viewer pull and stop stream in very short interval, it may cause the switching of coroutine and may occur problem like using freed object, during the reconnecting of Edge server. We need to reproduce this issue.

winlinvip commented 2 months ago

A similar issue, see https://github.com/ossrs/srs/issues/1829