CLUSTER: Pulling stream from a node within the origin server cluster, if disconnected and attempting to pull stream again within 3 seconds, an error will be reported.

wr-ning commented 2 years ago

Note: Before asking a question, please read the FAQ (Please read FAQ before filing an issue) https://github.com/ossrs/srs/issues/2716

Description

Please describe your issue here

There are currently 3 nodes in the origin server cluster, A, B, and C, and one edge streaming node. There is a stream being pushed from node C. When the player pulls the stream from the edge node for the first time, it plays normally. However, if the player stops and immediately tries to pull the stream again, it fails to play and the SRS log reports an error. But when the player pulls the stream from node C in the origin server cluster, it plays normally. Even if the player stops and immediately tries to pull the stream again, it plays without any issues. This can be repeated multiple times.

After continuous testing, it has been discovered that when the player pulls the stream from the edge node, if it disconnects and tries to pull the stream again within 3 seconds, it will result in a failed stream pull. Even the latest version of srs-server-4.0-b6 has been tested and it also experiences this issue.

After checking the issues, I thought it was #1520, but upon careful reading, I realized it is not the same issue. (https://github.com/ossrs/srs/issues/1520)

The error message is as follows: [2022-02-07 16:34:54.525][Error][109157][t4269891][11] serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping From the error message, it seems that the current edge node is performing a stop operation, preventing the stream from being pulled.

SRS Version: srs-server-4.0-b1(4.0.206)

Operating System: ubuntu-16.04.6-server-amd64

SRS Installation Package: srs-server-4.0-b1.tar.gz

SRS edge node log is as follows (Log):

[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] XCORE-SRS/4.0.206(Leo)
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] config parse complete
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] write log to console
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3][MAIN] SRS/4.0.206(Leo), MIT
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] authors: https://github.com/ossrs/srs/blob/4.0release/trunk/AUTHORS.txt
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] contributors: Winlin<winlin@vip.126.com> Winlin<winterserver@126.com> Winlin<chengli.ycl@alibaba-inc.com> Wenjie<zhaowenjie@tal.com> Wenjie<740936897@qq.com> Wenjie<wenjie.zhao@chinacache.com> xiangcheng.liu<liuxc0116@foxmail.com> naijia.liu<youngcow@youngcow.net> alcoholyi<alcoholyi@qq.com> byteman<wangchen2011@gmail.com> chad.wang<chad.wang.cn@gmail.com> suhetao<suhetao@gmail.com> Johnny<fengjihu@163.com> karthikeyan<keyanmca@gmail.com> StevenLiu<lq@chinaffmpeg.org> zhengfl<zhengfl_1989@126.com> tufang14<breadbean1449@gmail.com> allspace<allspace@gmail.com> niesongsong<nie950@gmail.com> rudeb0t<nimrod@themanxgroup.tw> CallMeNP<np.liamg@gmail.com> synote<synote@qq.com> lovecat<littlefawn@163.com> panda1986<542638787@qq.com> YueHonghui<hongf.yue@hotmail.com> ThomasDreibholz<dreibh@simula.no> JuntaoLiu<juntliu@gmail.com> RocFang<fangpeng1986@gmail.com> MakarovYaroslav<yaroslav.makarov.97@mail.ru> MirkoVelic<mvelic@inoxx.net> HuiZhang(huzhang2)<huzhang2@cisco.com> OtterWa<simpleotter23@gmail.com> walkermi<172192667@qq.com> haofz<fuzhuang.hao@vhall.com> ME_Kun_Han<hanvskun@hotmail.com> ljx0305<ljx0305@gmail.com> cenxinwei<censhanhe@163.com> StarBrilliant<m13253@hotmail.com> xubin<xubin@chnvideo.com> intliang<yintiliang@gmail.com> flowerwrong<sysuyangkang@gmail.com> YLX<568414379@qq.com> J<guotaojiang@qq.com> Harlan<hailiang@gvrcraft.com> hankun<hankun@bravovcloud.com> JonathanBarratt<jonathan.barratt@gmail.com> KeeganH<keeganwharris@gmail.com> StevenLiu<lingjiujianke@gmail.com> liuxc0116<liuxc0116@gmail.com> ChengdongZhang<lmajzcd@sina.com> lovacat<lovecat@china.sina.com> qiang.li<qiang.li@verycdn.com.cn> HungMingWu<u9089000@gmail.com> Himer<xishizhaohua@qq.com> XiaLixin<xialixin@kanzhun.com> XiaLixin<68469352@qq.com> XiaLixin<xlx0625@163.com> XiaLixin<xialx@yuntongxun.com> alphonsetai<tyh_123@163.com> Michael.Ma<wnpllr@gmail.com> lam2003<linmin3@yy.com> ShiWei<shiwei05@kuaishou.com> ShiWei<shi.weibd@hotmail.com> XiaofengWang<wasphin@gmail.com> XiaoZhihong<hondaxiao@tencent.com> XiaoZhihong<xiaozhihong8@gmail.com> XiaoZhihong<xiaozhihong@huya.com> yanghuiwen<cainiaodj@qq.com> WuPengqiang<309554135@qq.com> WuPengqiang<pengqiang.wpq@alibaba-inc.com> l<22312935+lam2003@github> xfalcon<x-falcon@github> ChenGuanghua<jinxue.cgh@alibaba-inc.com> ChenGuanghua<chengh_math@126.com> LiPeng<mozhan.lp@alibaba-inc.com> LiPeng<lipeng19811218@gmail.com> yajun18<yajun18@staff.sina.com.cn> liulichuan<liulichuan@kuaishou.com> yapingcat<caoyapingneu@163.com> chenchengbin<chenchengbin@yy.com> ChenHaibo<495810242@qq.com> ChenHaibo<nmgchenhaibo@foxmail.com> jasongwq<jasongwq@gmail.com> yinjiaoyuan<yinjiaoyuan@163.com> PieerePi<pihuibin@hotmail.com> JesseXi<jesse.jinjin@wo.cn> PieerePi<40780488+PieerePi@github> ghostsf<ghost_sf@163.com> xbpeng121<53243357+xbpeng121@github> johzzy<hellojinqiang@gmail.com> stone<bluestn@163.com> cfw11<34058899+cfw11@github> Hung-YiChen<gaod.chen@gmail.com> long<liyalong12345@126.com> matthew1838<77285055+matthew1838@github> rise<rise.worlds@outlook.com> 
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] cwd=/home/wsm/srs-server-4.0-b1/trunk, work_dir=./, build: 2021-12-20 20:18:37, configure: --x86-x64, uname: Linux ubuntu 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux, osx: 0
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] configure detail: --prefix=/usr/local/srs --hls=on --hds=off --dvr=on --ssl=on --https=on --ssl-1-0=off --ssl-local=off --sys-ssl=off --transcode=on --ingest=on --stat=on --http-callback=on --http-server=on --stream-caster=on --http-api=on --utest=off --cherrypy=off --srt=off --rtc=on --simulator=off --cxx11=off --cxx14=off --ffmpeg-fit=on --nasm=on --srtp-nasm=on --clean=on --gperf=off --gmc=off --gmd=off --gmp=off --gcp=off --gprof=off --static=off --shared-st=off --shared-srt=off --shared-ffmpeg=off --log-verbose=off --log-info=off --log-trace=on --gcov=off --debug=off --debug-stats=off --cross-build=off --cc=gcc --cxx=g++ --ar=ar --ld=ld --randlib=randlib
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] srs checking config...
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] ips, iface[0] bond0 ipv4 0x11443 192.168.1.185, iface[1] bond0 ipv6 0x11443 fe80::1618:77ff:fe4d:57f%bond0
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] devices, intranet bond0 192.168.1.185, intranet bond0 fe80::1618:77ff:fe4d:57f%bond0
[2022-02-07 16:34:29.059][Warn][109157][jp1i16y3][22] stats network use index=0, ip=192.168.1.185, ifname=bond0
[2022-02-07 16:34:29.059][Warn][109157][jp1i16y3][22] stats disk not configed, disk iops disabled.
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] write log to console
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] features, rch:on, dash:on, hls:on, hds:off, srt:off, hc:on, ha:on, hs:on, hp:on, dvr:on, trans:on, inge:on, stat:on, sc:on
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] SRS on  amd64 x86_64, conf:./conf/origin.cluster.edge.conf, limit:1000, writev:1024, encoding:little-endian, HZ:100
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] mw sleep:350ms. mr enabled:on, default:0, sleep:350ms
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] gc:on, pq:30000ms, cscc:[0,16), csa:on, tn:on(may hurts performance), ss:auto(guess by merged write)
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] system default latency(ms): mw(0-350) + mr(0-350) + play-queue(0-30000)
[2022-02-07 16:34:29.059][Warn][109157][jp1i16y3][22] SRS/4.0.206 is not stable
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] fingerprint=C7:56:04:AE:A3:19:FD:1B:D3:2B:AD:B3:2E:A9:BF:6C:6E:6E:7E:12:C9:5F:D9:06:CF:5F:A1:7D:04:FB:A7:80
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] CircuitBreaker: enabled=1, high=2x90, critical=1x95, dying=5x99
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] http: root mount to ./objs/nginx/html
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] server main cid=jp1i16y3, pid=109157, ppid=53212, asprocess=0
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] write pid=109157 to objs/edge.pid success!
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] RTMP listen at tcp://0.0.0.0:1935, fd=6
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] signal installed, reload=1, reopen=10, fast_quit=15, grace_quit=3
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] http: api mount /console to ./objs/nginx/html/console
[2022-02-07 16:34:29.068][Trace][109157][4s1606j8] Hybrid cpu=0.00%,11MB
[2022-02-07 16:34:29.068][Warn][109157][z993989r][22] use private address as ip: 192.168.1.185, ifname=bond0
[2022-02-07 16:34:29.068][Trace][109157][z993989r] Startup query id=c3ed94bc87f011ec84a7cfcef7e51d2e, eip=192.168.1.185, wait=312s
[2022-02-07 16:34:29.068][Trace][109157][mp510936] TCP: connection manager run, conns=0
[2022-02-07 16:34:29.069][Trace][109157][0tz9q447] RTC: connection manager run, conns=0
[2022-02-07 16:34:34.055][Trace][109157][4s1606j8] Hybrid cpu=0.00%,11MB
[2022-02-07 16:34:39.055][Trace][109157][4s1606j8] Hybrid cpu=1.00%,11MB
[2022-02-07 16:34:44.055][Trace][109157][4s1606j8] Hybrid cpu=1.00%,11MB, cid=1,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0
[2022-02-07 16:34:49.055][Trace][109157][4s1606j8] Hybrid cpu=0.00%,11MB, cid=1,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0
[2022-02-07 16:34:50.965][Trace][109157][55u9592w] RTMP client ip=192.168.1.210:62588, fd=7
[2022-02-07 16:34:50.970][Trace][109157][55u9592w] complex handshake success
[2022-02-07 16:34:50.971][Trace][109157][55u9592w] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:50.971][Trace][109157][55u9592w] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:51.052][Trace][109157][55u9592w] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] new source, stream_url=/live/test01
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=/
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] ignore disabled exec for vhost=__defaultVhost__
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] dispatch cached gop success. count=0, duration=-1
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] create consumer, active=1, queue_size=0.00, jitter=30000000
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] set fd=7, SO_SNDBUF=87040=>175000, buffer=350ms
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] start play smi=0ms, mw_sleep=350, mw_msgs=8, realtime=0, tcp_nodelay=0
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] update source_id=ueeyh653/55u9592w
[2022-02-07 16:34:51.096][Trace][109157][ueeyh653] complex handshake success.
[2022-02-07 16:34:51.097][Trace][109157][ueeyh653] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:51.176][Trace][109157][ueeyh653] connected, version=4.0.206.0, ip=127.0.0.1, pid=109153, id=0, dsu=1
[2022-02-07 16:34:51.176][Trace][109157][ueeyh653] edge-pull publish url rtmp://127.0.0.1:19351/live/test01, stream=test01 as test01
[2022-02-07 16:34:51.176][Trace][109157][ueeyh653] edge change from 100 to state 101 (pull).
[2022-02-07 16:34:51.216][Warn][109157][ueeyh653][11] RTMP redirect /live/test01 from 127.0.0.1:19351 to rtmp://127.0.0.1:19352/live/test01
[2022-02-07 16:34:51.219][Trace][109157][ueeyh653] complex handshake success.
[2022-02-07 16:34:51.219][Trace][109157][ueeyh653] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:51.296][Trace][109157][ueeyh653] connected, version=4.0.206.0, ip=127.0.0.1, pid=109154, id=0, dsu=1
[2022-02-07 16:34:51.296][Trace][109157][ueeyh653] edge-pull publish url rtmp://127.0.0.1:19352/live/test01, stream=test01 as test01
[2022-02-07 16:34:51.297][Trace][109157][ueeyh653] got metadata, width=1920, height=1080, vcodec=7, acodec=7
[2022-02-07 16:34:51.297][Trace][109157][ueeyh653] 42B video sh,  codec(7, profile=High, level=4, 1920x1080, 0kbps, 0.0fps, 0.0s)
[2022-02-07 16:34:53.129][Warn][109157][ueeyh653][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read
thread [109157][ueeyh653]: ingest() [src/app/srs_app_edge.cpp:320][errno=4]
thread [109157][ueeyh653]: recv_message() [src/protocol/srs_rtmp_stack.cpp:372][errno=4]
thread [109157][ueeyh653]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:859][errno=4]
thread [109157][ueeyh653]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:954][errno=4]
thread [109157][ueeyh653]: grow() [src/protocol/srs_protocol_stream.cpp:162][errno=4]
thread [109157][ueeyh653]: read() [src/protocol/srs_service_st.cpp:514][errno=4]
[2022-02-07 16:34:54.056][Trace][109157][4s1606j8] Hybrid cpu=0.00%,14MB, cid=1,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0
[2022-02-07 16:34:54.392][Trace][109157][t4269891] RTMP client ip=192.168.1.210:62591, fd=9
[2022-02-07 16:34:54.406][Trace][109157][t4269891] complex handshake success
[2022-02-07 16:34:54.406][Trace][109157][t4269891] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:54.406][Trace][109157][t4269891] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:54.484][Trace][109157][t4269891] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:54.524][Trace][109157][t4269891] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:54.524][Trace][109157][t4269891] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:54.524][Trace][109157][t4269891] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:54.524][Trace][109157][t4269891] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=ueeyh653/55u9592w
[2022-02-07 16:34:54.525][Trace][109157][t4269891] TCP: before dispose resource(RtmpConn)(0x13f4090), conns=2, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:34:54.525][Error][109157][t4269891][11] serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping
thread [109157][t4269891]: do_cycle() [src/app/srs_app_rtmp_conn.cpp:217][errno=11]
thread [109157][t4269891]: service_cycle() [src/app/srs_app_rtmp_conn.cpp:414][errno=11]
thread [109157][t4269891]: playing() [src/app/srs_app_rtmp_conn.cpp:659][errno=11]
thread [109157][t4269891]: create_consumer() [src/app/srs_app_source.cpp:2582][errno=11]
thread [109157][t4269891]: on_client_play() [src/app/srs_app_edge.cpp:674][errno=11](Resource temporarily unavailable)
[2022-02-07 16:34:54.525][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=2, removing=0, unsubs=0
[2022-02-07 16:34:54.525][Trace][109157][t4269891] TCP: disposing #0 resource(RtmpConn)(0x13f4090), conns=2, disposing=1, zombies=0
[2022-02-07 16:34:55.340][Trace][109157][h94o04l7] RTMP client ip=192.168.1.210:62592, fd=9
[2022-02-07 16:34:55.355][Trace][109157][h94o04l7] complex handshake success
[2022-02-07 16:34:55.356][Trace][109157][h94o04l7] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:55.356][Trace][109157][h94o04l7] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:55.436][Trace][109157][h94o04l7] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=ueeyh653/55u9592w
[2022-02-07 16:34:55.477][Trace][109157][h94o04l7] TCP: before dispose resource(RtmpConn)(0x13f3a00), conns=2, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:34:55.477][Error][109157][h94o04l7][11] serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping
thread [109157][h94o04l7]: do_cycle() [src/app/srs_app_rtmp_conn.cpp:217][errno=11]
thread [109157][h94o04l7]: service_cycle() [src/app/srs_app_rtmp_conn.cpp:414][errno=11]
thread [109157][h94o04l7]: playing() [src/app/srs_app_rtmp_conn.cpp:659][errno=11]
thread [109157][h94o04l7]: create_consumer() [src/app/srs_app_source.cpp:2582][errno=11]
thread [109157][h94o04l7]: on_client_play() [src/app/srs_app_edge.cpp:674][errno=11](Resource temporarily unavailable)
[2022-02-07 16:34:55.477][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=2, removing=0, unsubs=0
[2022-02-07 16:34:55.477][Trace][109157][h94o04l7] TCP: disposing #0 resource(RtmpConn)(0x13f3a00), conns=2, disposing=1, zombies=0
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] cleanup when unpublish
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] edge change from 101 to 300 then 0 (init).
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] TCP: before dispose resource(RtmpConn)(0x13656d0), conns=1, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:34:56.129][Warn][109157][55u9592w][4] client disconnect peer. ret=1007
[2022-02-07 16:34:56.129][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] TCP: disposing #0 resource(RtmpConn)(0x13656d0), conns=1, disposing=1, zombies=0
[2022-02-07 16:34:56.180][Trace][109157][5rz6t681] RTMP client ip=192.168.1.210:62595, fd=7
[2022-02-07 16:34:56.184][Trace][109157][5rz6t681] complex handshake success
[2022-02-07 16:34:56.184][Trace][109157][5rz6t681] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:56.184][Trace][109157][5rz6t681] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:56.266][Trace][109157][5rz6t681] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=/ueeyh653
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] ignore disabled exec for vhost=__defaultVhost__
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] dispatch cached gop success. count=0, duration=-1
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] create consumer, active=1, queue_size=0.00, jitter=30000000
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] set fd=7, SO_SNDBUF=87040=>175000, buffer=350ms
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] start play smi=0ms, mw_sleep=350, mw_msgs=8, realtime=0, tcp_nodelay=0
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] update source_id=rh46sb82/ueeyh653
[2022-02-07 16:34:56.308][Trace][109157][rh46sb82] complex handshake success.
[2022-02-07 16:34:56.308][Trace][109157][rh46sb82] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:56.384][Trace][109157][rh46sb82] connected, version=4.0.206.0, ip=127.0.0.1, pid=109153, id=0, dsu=1
[2022-02-07 16:34:56.384][Trace][109157][rh46sb82] edge-pull publish url rtmp://127.0.0.1:19351/live/test01, stream=test01 as test01
[2022-02-07 16:34:56.384][Trace][109157][rh46sb82] edge change from 100 to state 101 (pull).
[2022-02-07 16:34:56.424][Warn][109157][rh46sb82][11] RTMP redirect /live/test01 from 127.0.0.1:19351 to rtmp://127.0.0.1:19352/live/test01
[2022-02-07 16:34:56.426][Trace][109157][rh46sb82] complex handshake success.
[2022-02-07 16:34:56.427][Trace][109157][rh46sb82] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:56.504][Trace][109157][rh46sb82] connected, version=4.0.206.0, ip=127.0.0.1, pid=109154, id=0, dsu=1
[2022-02-07 16:34:56.504][Trace][109157][rh46sb82] edge-pull publish url rtmp://127.0.0.1:19352/live/test01, stream=test01 as test01
[2022-02-07 16:34:56.505][Trace][109157][rh46sb82] got metadata, width=1920, height=1080, vcodec=7, acodec=7
[2022-02-07 16:34:56.505][Trace][109157][rh46sb82] 42B video sh,  codec(7, profile=High, level=4, 1920x1080, 0kbps, 0.0fps, 0.0s)
[2022-02-07 16:34:58.830][Warn][109157][rh46sb82][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read
thread [109157][rh46sb82]: ingest() [src/app/srs_app_edge.cpp:320][errno=4]
thread [109157][rh46sb82]: recv_message() [src/protocol/srs_rtmp_stack.cpp:372][errno=4]
thread [109157][rh46sb82]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:859][errno=4]
thread [109157][rh46sb82]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:954][errno=4]
thread [109157][rh46sb82]: grow() [src/protocol/srs_protocol_stream.cpp:162][errno=4]
thread [109157][rh46sb82]: read() [src/protocol/srs_service_st.cpp:514][errno=4]
[2022-02-07 16:34:59.056][Trace][109157][4s1606j8] Hybrid cpu=1.00%,14MB, cid=11,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0, free=1, objs=(pkt:0,raw:0,fua:0,msg:60,oth:0,buf:0)
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] cleanup when unpublish
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] edge change from 101 to 300 then 0 (init).
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] TCP: before dispose resource(RtmpConn)(0x13656d0), conns=1, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:35:01.830][Warn][109157][5rz6t681][4] client disconnect peer. ret=1007
[2022-02-07 16:35:01.830][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] TCP: disposing #0 resource(RtmpConn)(0x13656d0), conns=1, disposing=1, zombies=0
[2022-02-07 16:35:04.056][Trace][109157][4s1606j8] Hybrid cpu=1.00%,14MB, cid=11,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0, free=1, objs=(pkt:0,raw:0,fua:0,msg:60,oth:0,buf:0)
[2022-02-07 16:35:09.056][Trace][109157][4s1606j8] Hybrid cpu=0.00%,14MB, cid=11,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0, free=1, objs=(pkt:0,raw:0,fua:0,msg:60,oth:0,buf:0)
^C[2022-02-07 16:35:09.424][Trace][109157][jp1i16y3] sig=2, user terminate program, fast quit
[2022-02-07 16:35:10.060][Trace][109157][jp1i16y3] cleanup for quit signal fast=1, grace=0
[2022-02-07 16:35:10.060][Warn][109157][jp1i16y3][11] main cycle terminated, system quit normally.
[2022-02-07 16:35:10.160][Trace][109157][jp1i16y3] srs disposed
[2022-02-07 16:35:10.160][Trace][109157][jp1i16y3] srs terminated

SRS configuration is as follows (Config):

origin.cluster.serverA.conf

# the config for srs origin-origin cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              19350;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverA.pid;
http_api {
    enabled         on;
    listen          9090;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9091 127.0.0.1:9092;
    }
}

origin.cluster.serverB.conf

# the config for srs origin-origin cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              19351;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverB.pid;
http_api {
    enabled         on;
    listen          9091;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9090 127.0.0.1:9092;
    }
}

origin.cluster.serverC.conf

# the config for srs origin-origin cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              19352;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverC.pid;
http_api {
    enabled         on;
    listen          9092;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9090 127.0.0.1:9091;
    }
}

origin.cluster.edge.conf

# the config for srs origin-edge cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              1935;
max_connections     1000;
pid                 objs/edge.pid;
daemon              off;
srs_log_tank        console;
vhost __defaultVhost__ {
    cluster {
        mode            remote;
        origin          127.0.0.1:19351 127.0.0.1:19350;
    }
}

Replay

How to replay bug?

Open a shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.serverA.conf.

Open another shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.serverB.conf.

Open another shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.serverC.conf.

Open another shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.edge.conf.

Open CMD on your local computer and use FFmpeg to push the stream to the serverC node. Run the command: ffmpeg -i rtsp://admin:admin@192.168.1.58:554/h264/ch1/main/av_stream -c copy -f flv rtmp://192.168.1.185:19352/live/test01.

Open CMD on your local computer and use ffplay to pull and play the stream from the edge node. Run the command: ffplay -i rtmp://192.168.1.185:1935/live/test01.

After the stream is playing normally, stop pulling the stream. Within 3 seconds, repeat step 6 to pull the stream again. This will reproduce the issue, and SRS logs will show an error. (The reason for using ffplay to pull the stream in step 6 is that it allows for quick repetition of the action within a short time. VLC operations would be slower.)

Expected Behavior (Expect)

> Describe your expectation Please describe what you expect to happen.

Expectation in the source cluster environment I expect that the edge streaming nodes, in the source cluster environment, can successfully pull streams just like the nodes in the previous standalone deployment, regardless of the time of pulling the stream (assuming the stream is being pushed).

TRANS_BY_GPT3

wr-ning commented 2 years ago

(Business Scenario Supplement:

There is a video preview list page in the project, and a video preview detail page. The list page has 6 small video streaming windows playing. When you click on a small video, you will enter the detail page, which has more content to display, including video preview, content introduction, operations, etc.
Because the process of entering the detail page triggers the above problem, the video instance is destroyed when leaving the list page, which means the connection with SRS is disconnected. When entering the detail page, the video instance is recreated, which means the connection with SRS is established again.
The video streams in the project are all pulled from the edge nodes in SRS.)

Please make sure to maintain the markdown structure.

TRANS_BY_GPT3

winlinvip commented 2 years ago

This seems to be a retry strategy problem, belonging to the boundary conditions that need optimization. You can reduce the retry interval a bit.

However, no matter what, when edge sourcing, it is necessary to consider destroying the flow of sourcing to avoid a large number of flows sourcing. Therefore, there will always be a time window where playback is prohibited, which cannot be avoided regardless of optimization efforts.

A better way to bypass this issue is to have the client support multiple edges and retry several times in case of failure. This approach can effectively solve the problem.

Of course, the retry interval for the edges is worth optimizing.

TRANS_BY_GPT3

wr-ning commented 2 years ago

First of all, thank you for patiently answering. Seeing this issue tagged with "5.0 milestone" instantly made me feel like it's still far away.

Today, I tried following your suggestions and attempted to use flv.js. Since it only allows listening through events and executing reconnection, I had to try it out. During testing, there weren't any issues for now. I will observe the results once it's deployed online.

TRANS_BY_GPT3

jinleileiking commented 2 years ago

I have also encountered this. It seems to be a 3-second delay implemented for security purposes. You can modify the code and it would be best to make the 3-second delay configurable.

https://github.com/ossrs/srs/issues/2215#issuecomment-901682601

By the way, this issue is becoming a frequently asked question.

TRANS_BY_GPT3

wr-ning commented 2 years ago

Thank you. I have read #2215 and also #2707. Indeed, from the discussion, it seems that this issue arises every few months. The process of modifying the code, recompiling, and repackaging is a bit costly for us, who are working on other languages.

For now, we can only temporarily solve this problem by following the "listen and reconnect" approach mentioned above. We will try other solutions when we have a better one. https://github.com/ossrs/srs/issues/2901#issuecomment-1031491071

TRANS_BY_GPT3

winlinvip commented 2 years ago

Um, client reconnection is definitely necessary. Don't rely solely on server improvements. Stability is a system engineering concern, and of course, the client needs to be considered.

In general, there is always a way to bypass common problems. Solutions are always more abundant than problems.

TRANS_BY_GPT3

mingyang0921 commented 2 years ago

I added an event that allows for notifications and waiting. Additionally, I am considering adding a configuration file to enable or disable this mode. We have already updated our own server using this method. So far, there haven't been any issues. @winlinvip

srs_error_t SrsPlayEdge::on_client_play()
{
    srs_error_t err = srs_success;

    // start ingest when init state.
    if (state == SrsEdgeStateInit) {
        state = SrsEdgeStatePlay;
        err = ingester->start();
    } else if (state == SrsEdgeStateIngestStopping) {
        srs_cond_wait(ingester_wait);
        srs_trace("on_client_play srs_cond_wait.");
        if(state == SrsEdgeStateInit){
            state = SrsEdgeStatePlay;
            err = ingester->start();
        }else if(state == SrsEdgeStateIngestStopping){
            return srs_error_new(ERROR_RTMP_EDGE_PLAY_STATE, "state is stopping 1.");
        }else{
            return srs_error_new(ERROR_RTMP_EDGE_PLAY_STATE, "state is stopping 2.");
        }
    }

    return err;
}

void SrsPlayEdge::on_all_client_stop()
{
    // when all client disconnected,
    // and edge is ingesting origin stream, abort it.
    if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
        SrsEdgeState pstate = state;
        state = SrsEdgeStateIngestStopping;

        srs_trace("on_all_client_stop begin. ");
        ingester->stop();

        state = SrsEdgeStateInit;
        srs_trace("edge change from %d to %d then %d (init).", pstate, SrsEdgeStateIngestStopping, state);
        srs_cond_signal(ingester_wait);
        srs_trace("on_all_client_stop srs_cond_signal");
        return;
    }
}

TRANS_BY_GPT3

winlinvip commented 2 years ago

Sorry for keeping everyone waiting for so long, I have resolved it in 4.0.267.

The solution of waiting for notification is good (although there is an even simpler solution). 👍

Hahaha, everyone actually gave quite a few solutions, but the best one is still SRS, so we don't need to use menstrual patches anymore. 😄

I checked and the retry time is around 3 seconds. The reason is that when the last playback is triggered, it will stop pulling the stream, and this stop coroutine operation takes a considerable amount of time. Other operations are completed in milliseconds.

[2022-10-10 08:01:45.207] Debug: Start to stop edge coroutine
[2022-10-10 08:01:48.207] Debug: Start to stop edge upstream
[2022-10-10 08:01:48.208] Debug: Start to unpublish source
[2022-10-10 08:01:48.209] Debug: Edge stopped

Looking at the code, it is waiting for a fixed 3 seconds, so the mechanism here is not reasonable.

void SrsPlayEdge::on_all_client_stop() {
    if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
        state = SrsEdgeStateIngestStopping;
        ingester->stop();
}

void SrsEdgeIngester::stop() {
    trd->stop();
    upstream->close();
    source->on_unpublish();
}

srs_error_t SrsEdgeIngester::cycle() {
    while (true) {
        do_cycle(); // Interrupted by stop.
        srs_usleep(SRS_EDGE_INGESTER_CIMS); // 3s
    }
}

A more reasonable approach would be to improve the thread stopping mechanism. The problem is that when the thread is stopped, it interrupts do_cycle(), but the outer sleep is not interrupted (it sleeps again after the interruption). We just need to solve this problem in a simple way.

After the improvement, the stop operation is completed within 2ms, which is generally not triggered by normal user operations.

[2022-10-10 08:07:07.455] Debug: Start to stop edge coroutine
[2022-10-10 08:07:07.457] Debug: Edge stopped

Although condition variables or delayed release are also solutions to this problem, re-releasing the object (somewhat like restarting) is the simplest solution, and the state of the object is also very clear and simple, so the problem is minimized.

Note: Some friends hope that we can implement delayed release, but in fact, there is no perceptible difference in testing compared to re-pulling the stream. However, delayed release of the stream requires a timer, which can make it more complicated. Moreover, delayed release of the stream will cause the stream to be pulled even when no one is playing it, which can also cause issues in certain scenarios.

TRANS_BY_GPT3

mingyang0921 commented 2 years ago

@winlinvip This modification is relatively simple, but if the edge nodes are attacked, it may bring down the origin server. This contradicts the purpose of our original servers, doesn't it?

TRANS_BY_GPT3

winlinvip commented 1 year ago

Preventing attacks is a separate solution, and this retry interval is not designed for preventing attacks.

TRANS_BY_GPT3

ossrs / srs

CLUSTER: Pulling stream from a node within the origin server cluster, if disconnected and attempting to pull stream again within 3 seconds, an error will be reported. #2901