ossrs / srs

SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
25.37k stars 5.34k forks source link

RTC2RTMP: Crash when converting RTC to RTMP. SRS crash issue, core file and log attached below. #3034

Open shawnlihst opened 2 years ago

shawnlihst commented 2 years ago

Note: Please read FAQ before file an issue, see 2716

Note: Before asking a question, please refer to the FAQ, specifically 2716.

Description (描述)

Please describe your issue here (描述你遇到了什么问题)
Make sure to maintain the markdown structure.

srsCrash crashed at 11:13 AM on May 13, 2022.
Make sure to maintain the markdown structure.

  1. SRS Version: SRS/4.0.245
    Make sure to maintain the markdown structure.

  2. SRS Log:
    Make sure to maintain the markdown structure.

The log files and core files are too large, attached in the download link. The crash occurred on May 13, 2022, at 11:13.
  1. SRS Config (Configuration)
listen              29711;
max_connections     1000;
# For docker, please use docker logs to manage the logs of SRS.
# See https://docs.docker.com/config/containers/logging/
daemon              off;
srs_log_tank        file;
srs_log_file        ./objs/srsmeeting.log;
pid                 objs/meeting.pid;

http_api {
    enabled         on;
    listen          8029;

}
http_server {
    enabled         on;
    listen          29713;
    dir             ./objs/nginx/html;
}
rtc_server {
    enabled on;
    listen 29715; # UDP port
    # @see https://github.com/ossrs/srs/wiki/v4_CN_WebRTC#config-candidate
    candidate server public IP, not convenient to display;
}
stats {
    network         0;
    disk            sda sdb xvda xvdb;
}

vhost __defaultVhost__ {
     enabled         on;
     min_latency     on;
     tcp_nodelay     on;
rtc {
     enabled     on;
     rtmp_to_rtc on;
     rtc_to_rtmp on;
     # The PLI interval in seconds, for RTC to RTMP.
     # Note the available range is [0.5, 30]
     # Default: 6.0
     # pli_for_rtmp 6.0;
    }
play {
        # set the MW(merged-write) latency in ms.
        #         # SRS always set mw on, so we just set the latency value.
        # the latency of stream >= mw_latency + mr_latency
        # the value recomment is [300, 1800]
        queue_length    10;
        gop_cache       off;
        mw_latency      100;
      }
}

Replay (重现)

Please describe how to replay the bug? (重现Bug的步骤)

No information available at the moment. If reproduced, it will be added later.

Expect (Expected Behavior)

Please describe your expectation (Describe what you expect to happen)

TRANS_BY_GPT3

winlinvip commented 2 years ago

Download the coredump and SRS binary, then run gdb:

docker run --rm -it -v `pwd`:/srs -w /srs registry.cn-hangzhou.aliyuncs.com/ossrs/srs:dev gdb srs -c core*

The stack of coredump:

[root@afa08885f337 srs]# gdb srs -c core.21038

Core was generated by `./objs/srs -c conf/ucpmeeting.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000618d54 in SrsRtpPacket::get_avsync_time (this=0x0) at src/kernel/srs_kernel_rtc_rtp.hpp:335

warning: Source file is more recent than executable.
335     int64_t get_avsync_time() const { return avsync_time_; }
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) bt
#0  0x0000000000618d54 in SrsRtpPacket::get_avsync_time (this=0x0) at src/kernel/srs_kernel_rtc_rtp.hpp:335
#1  0x000000000061361f in SrsRtmpFromRtcBridger::packet_video_rtmp (this=0x10ddc420, start=0, end=22704)
    at src/app/srs_app_rtc_source.cpp:1614
#2  0x000000000061277f in SrsRtmpFromRtcBridger::packet_video (this=0x10ddc420, src=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:1451
#3  0x0000000000611f34 in SrsRtmpFromRtcBridger::on_rtp (this=0x10ddc420, pkt=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:1347
#4  0x000000000060f48e in SrsRtcSource::on_rtp (this=0x61cf3c0, pkt=0x103c28e0) at src/app/srs_app_rtc_source.cpp:632
#5  0x0000000000617c03 in SrsRtcVideoRecvTrack::on_rtp (this=0xe2db520, source=0x61cf3c0, pkt=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:2520
#6  0x00000000005d22cb in SrsRtcPublishStream::do_on_rtp_plaintext (this=0xf11f660, pkt=@0x7fc966db7758: 0x103c28e0, 
    buf=0x7fc966db7740) at src/app/srs_app_rtc_conn.cpp:1451
#7  0x00000000005d2086 in SrsRtcPublishStream::on_rtp_plaintext (this=0xf11f660, 
    plaintext=0x2572200 "\220\375X\261T\331\341x\336\201\267x\276\336", nb_plaintext=31) at src/app/srs_app_rtc_conn.cpp:1419
#8  0x00000000005d1db8 in SrsRtcPublishStream::on_rtp (this=0xf11f660, 
    data=0x2572200 "\220\375X\261T\331\341x\336\201\267x\276\336", nb_data=41) at src/app/srs_app_rtc_conn.cpp:1386
#9  0x00000000005d6316 in SrsRtcConnection::on_rtp (this=0x3fcc690, 
    data=0x2572200 "\220\375X\261T\331\341x\336\201\267x\276\336", nb_data=41) at src/app/srs_app_rtc_conn.cpp:2272
#10 0x000000000060914a in SrsRtcServer::on_udp_packet (this=0x2438ab0, skt=0x7fc966db7b20)
    at src/app/srs_app_rtc_server.cpp:462
#11 0x00000000005a8e25 in SrsUdpMuxListener::cycle (this=0x254ef80) at src/app/srs_app_listener.cpp:620
#12 0x0000000000525a5d in SrsFastCoroutine::cycle (this=0x255f0c0) at src/app/srs_app_st.cpp:272
#13 0x0000000000525ae0 in SrsFastCoroutine::pfn (arg=0x255f0c0) at src/app/srs_app_st.cpp:287
#14 0x0000000000652d3b in _st_thread_main () at sched.c:363
#15 0x00000000006535ae in st_thread_create (start=0x7fc965cd0760 <main_arena>, arg=0x38, joinable=32713, stk_size=1704519420)
    at sched.c:694

This is the packet retrieved from the cache when converting from RTC to RTMP, and it is NULL.

    SrsCommonMessage rtmp;
    SrsRtpPacket* pkt = cache_video_pkts_[cache_index(start)].pkt;
    rtmp.header.initialize_video(nb_payload, pkt->get_avsync_time(), 1);

This leads to a null pointer reference.

Note: If you are not using RTC to RTMP, you can disable this feature rtc_to_rtmp to bypass it.

The information of the cached packets is 'cache_video_pkts_'.

(gdb) p cache_video_pkts_
$1 = {{in_use = false, sn = 0, ts = 0, rtp_ts = 0, pkt = 0x0} <repeats 166 times>, {in_use = true, sn = 22694, 
    ts = 224828526, rtp_ts = 1423527736, pkt = 0x48205b0}, {in_use = true, sn = 22695, ts = 224828558, rtp_ts = 1423530616, 
    pkt = 0x75a3110}, {in_use = true, sn = 22696, ts = 224828590, rtp_ts = 1423533496, pkt = 0x58f4930}, {in_use = true, 
    sn = 22697, ts = 224828638, rtp_ts = 1423537816, pkt = 0x855c6d0}, {in_use = true, sn = 22698, ts = 224828670, 
    rtp_ts = 1423540696, pkt = 0xc528c10}, {in_use = true, sn = 22699, ts = 224828718, rtp_ts = 1423545016, pkt = 0x3eac500}, 
  {in_use = true, sn = 22700, ts = 224828750, rtp_ts = 1423547896, pkt = 0x2b841b0}, {in_use = true, sn = 22701, 
    ts = 224828782, rtp_ts = 1423550776, pkt = 0x4850700}, {in_use = true, sn = 22702, ts = 224828830, rtp_ts = 1423555096, 
    pkt = 0x6756ed0}, {in_use = true, sn = 22703, ts = 224828862, rtp_ts = 1423557976, pkt = 0xbfe3d90}, {in_use = true, 
    sn = 22704, ts = 224828894, rtp_ts = 1423560856, pkt = 0x30e0a20}, {in_use = true, sn = 22705, ts = 224828942, 
    rtp_ts = 1423565176, pkt = 0x6a31100}, {in_use = false, sn = 0, ts = 0, rtp_ts = 0, pkt = 0x0} <repeats 334 times>}

The information of 'bridger' is:

(gdb) p *this
$3 = {<ISrsRtcSourceBridger> = {_vptr.ISrsRtcSourceBridger = 0x9e5070 <vtable for SrsRtmpFromRtcBridger+16>}, source_ = 
    0xb4072e0, codec_ = 0xb0776e0, is_first_audio = true, is_first_video = true, format = 0x9898070, 
  static s_cache_size = 512, cache_video_pkts_ = {{in_use = false, sn = 0, ts = 0, rtp_ts = 0, 
      pkt = 0x0} <repeats 166 times>, {in_use = true, sn = 22694, ts = 224828526, rtp_ts = 1423527736, pkt = 0x48205b0}, {
      in_use = true, sn = 22695, ts = 224828558, rtp_ts = 1423530616, pkt = 0x75a3110}, {in_use = true, sn = 22696, 
      ts = 224828590, rtp_ts = 1423533496, pkt = 0x58f4930}, {in_use = true, sn = 22697, ts = 224828638, rtp_ts = 1423537816, 
      pkt = 0x855c6d0}, {in_use = true, sn = 22698, ts = 224828670, rtp_ts = 1423540696, pkt = 0xc528c10}, {in_use = true, 
      sn = 22699, ts = 224828718, rtp_ts = 1423545016, pkt = 0x3eac500}, {in_use = true, sn = 22700, ts = 224828750, 
      rtp_ts = 1423547896, pkt = 0x2b841b0}, {in_use = true, sn = 22701, ts = 224828782, rtp_ts = 1423550776, 
      pkt = 0x4850700}, {in_use = true, sn = 22702, ts = 224828830, rtp_ts = 1423555096, pkt = 0x6756ed0}, {in_use = true, 
      sn = 22703, ts = 224828862, rtp_ts = 1423557976, pkt = 0xbfe3d90}, {in_use = true, sn = 22704, ts = 224828894, 
      rtp_ts = 1423560856, pkt = 0x30e0a20}, {in_use = true, sn = 22705, ts = 224828942, rtp_ts = 1423565176, 
      pkt = 0x6a31100}, {in_use = false, sn = 0, ts = 0, rtp_ts = 0, pkt = 0x0} <repeats 334 times>}, header_sn_ = 0, 
  lost_sn_ = 22705, rtp_key_frame_ts_ = -1}

The information of the caller's SN is:

(gdb) f 2
#2  0x000000000061277f in SrsRtmpFromRtcBridger::packet_video (this=0x10ddc420, src=0x103c28e0)
    at src/app/srs_app_rtc_source.cpp:1451
1451                    if ((err = packet_video_rtmp(header_sn_, tail_sn)) != srs_success) {
(gdb) p header_sn_
$4 = 0
(gdb) p tail_sn
$5 = 22704

The log at the time of the crash:

[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] RTC: DTLS handshake done.
[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] RTC: session pub=1, sub=0, to=30000ms connection established
[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] RTC: Publisher url=/meeting/100000000036783-436778429 established
[2022-05-13 11:13:47.383][Trace][21038][q38p5iuu] ignore disabled exec for vhost=__defaultVhost__
[2022-05-13 11:13:47.563][Trace][21038][8i0q72f4] NACK: update seq=22652, nack range [22651, 22652]
[2022-05-13 11:13:47.621][Trace][21038][4mt73s5j] time diff to large=1099, next out=227400, new pkt=228499, set to new pkt
[2022-05-13 11:13:48.081][Trace][21038][3q11ude1] time diff to large=1200, next out=474480, new pkt=475680, set to new pkt
[2022-05-13 11:13:48.122][Trace][21038][941q820f] RTC: Need PLI ssrc=717477422, play=[941q820f], publish=[8i0q72f4], count=43/43
[2022-05-13 11:13:48.122][Trace][21038][8i0q72f4] RTC: Request PLI ssrc=717477422, play=[941q820f], count=43/43, bytes=12B
[2022-05-13 11:13:48.195][Trace][21038][919d8dz3] 38B video sh,  codec(7, profile=Baseline, level=3.1, 640x360, 0kbps, 0.0fps, 0.0s)
[2022-05-13 11:13:48.195][Trace][21038][919d8dz3] set ts=3591688447, header=25326, lost=25327
[2022-05-13 11:13:48.962][Trace][21038][4mt73s5j] time diff to large=1001, next out=228839, new pkt=229840, set to new pkt
[2022-05-13 11:13:49.378][Trace][21038][941q820f] Hybrid cpu=7.99%,336MB, cid=10,34, timer=61,24,396, clock=0,42,5,0,0,0,0,1,0, free=1, objs=(pkt:814,raw:394,fua:418,msg:1178,oth:2,buf:255)
[2022-05-13 11:13:49.378][Trace][21038][941q820f] RTC: Server conns=9, rpkts=(245,rtp:213,stun:3,rtcp:28), spkts=(431,rtp:396,stun:3,rtcp:52), rtcp=(pli:1,twcc:21,rr:2), snk=(214,a:107,v:107,h:0), fid=(id:1,fid:245,ffid:0,addr:3,faddr:245)

TRANS_BY_GPT3

anjisuan783 commented 2 years ago
  1. It can be seen from SrsRtmpFromRtcBridger::packet_video_rtmp (this=0x10ddc420, start=0, end=22704) that there is a fault, under normal circumstances it is not possible to have so many packet losses (distance(0, 22704));
  2. Continuing to look at headersn=0, rtp_key_framets=-1, it is speculated that no key_frame has been received, which can be proven from the logs, filtering condition ContextId=q38p5iuu);
  3. It can be inferred that lostsn has not been initialized (random value 22705), coincidentally pkt->header.get_sequence is also 22705, and the program can reach line 1451 in srs_app_rtc_source.cpp. In the end, it crashes. Conclusion: The tragedy caused by lostsn not being initialized. The starting sequence of webrtc is a random value, as long as the first frame is not a key_frame, the sequence within a GOP can increase to lostsn and cause a crash.

TRANS_BY_GPT3