ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
24.75k stars 5.28k forks source link

srs crash when frequently do TCP connect, webrtc publish/play and disconnect. #3864

Closed sandro-qiang closed 2 months ago

sandro-qiang commented 8 months ago

Describe the bug A video chat webapp, using srs's js sdk. If 4, 5 or more people do tcp publish and play, and someone may refresh page, the srs server will crash.

Version 6

Additional context crash log

================================================================= ==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x6040012d2dd8 at pc 0x55faa2b08e65 bp 0x7f6acf07ed60 sp 0x7f6acf07ed50 READ of size 8 at 0x6040012d2dd8 thread T1 (srs-hybrid-2)

0 0x55faa2b08e64 in SrsRtcTcpNetwork::write(void, unsigned long, long) src/app/srs_app_rtc_network.cpp:672

#1 0x55faa2a93f41 in SrsRtcConnection::do_send_packet(SrsRtpPacket*) src/app/srs_app_rtc_conn.cpp:2491
#2 0x55faa2b475aa in SrsRtcAudioSendTrack::on_rtp(SrsRtpPacket*) src/app/srs_app_rtc_source.cpp:2793
#3 0x55faa2a7dfcb in SrsRtcPlayStream::send_packet(SrsRtpPacket*&) src/app/srs_app_rtc_conn.cpp:735
#4 0x55faa2a7cdbe in SrsRtcPlayStream::cycle() src/app/srs_app_rtc_conn.cpp:670
#5 0x55faa27f7459 in SrsFastCoroutine::cycle() src/app/srs_app_st.cpp:285
#6 0x55faa27f75a9 in SrsFastCoroutine::pfn(void*) src/app/srs_app_st.cpp:300
#7 0x55faa2bb3520 in _st_thread_main /srs/trunk/objs/Platform-SRS6-Linux-5.15.0-GCC9.4.0-x86_64/st-srs/sched.c:380
#8 0x55faa2bb3e46 in st_thread_create /srs/trunk/objs/Platform-SRS6-Linux-5.15.0-GCC9.4.0-x86_64/st-srs/sched.c:666

0x6040012d2dd8 is located 8 bytes inside of 48-byte region [0x6040012d2dd0,0x6040012d2e00) freed by thread T1 (srs-hybrid-2) here:

0 0x55faa24b67af in operator delete(void*) (/usr/local/srs/objs/srs+0x4d37af)

previously allocated by thread T1 (srs-hybrid-2) here:

0 0x55faa24b5817 in operator new(unsigned long) (/usr/local/srs/objs/srs+0x4d2817)

Thread T1 (srs-hybrid-2) created by T0 here:

0 0x55faa23e0ad5 in __interceptor_pthread_create (/usr/local/srs/objs/srs+0x3fdad5)

#1 0x55faa2a39df3 in SrsThreadPool::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, SrsCplxError* (*)(void*), void*) src/app/srs_app_threads.cpp:676
#2 0x55faa2bb1e55 in run_in_thread_pool() src/main/srs_main_server.cpp:517
#3 0x55faa2bb18bd in run_directly_or_daemon() src/main/srs_main_server.cpp:456
#4 0x55faa2bae98e in do_main(int, char**, char**) src/main/srs_main_server.cpp:245
#5 0x55faa2baeca9 in main src/main/srs_main_server.cpp:256
#6 0x7f6ad4b86082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)

SUMMARY: AddressSanitizer: heap-use-after-free src/app/srs_app_rtc_network.cpp:672 in SrsRtcTcpNetwork::write(void, unsigned long, long) Shadow bytes around the buggy address: 0x0c0880252560: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x0c0880252570: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fd 0x0c0880252580: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x0c0880252590: fa fa 00 00 00 00 00 04 fa fa fd fd fd fd fd fa 0x0c08802525a0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa =>0x0c08802525b0: fa fa fd fd fd fd fd fd fa fa fd[fd]fd fd fd fd 0x0c08802525c0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x0c08802525d0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x0c08802525e0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa 0x0c08802525f0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fd 0x0c0880252600: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc

sandro-qiang commented 8 months ago

some more information.

before crash

[2023-11-06 13:10:57.366][WARN][1][j498ff67][0] DTLS: SSL3 alert method=read type=warning, desc=CN(close notify), where=16388, ret=256, r1=0 [2023-11-06 13:10:57.366][INFO][1][j498ff67] RTC: session destroy by DTLS alert(warning CN), username=0r8phna1:Tplr [2023-11-06 13:10:57.366][INFO][1][j498ff67] RTC: before dispose resource(RtcConn)(0x61d000010e80), conns=33, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.366][INFO][1][j498ff67] RTC: session detach from j498ff67, disposing=1 [2023-11-06 13:10:57.366][INFO][1][j498ff67] RTC: tcp conn diposing, because of rtc connection [2023-11-06 13:10:57.366][INFO][1][7f099h12] TCP: before dispose resource(Tcp)(0x60c000022600), conns=33, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.366][ERROR][1][j498ff67][0] serve error code=1070(StThreadInterrupt)(ST thread is interrupted) : rtc tcp conn : interrupted thread [1][j498ff67]: do_cycle() [./src/app/srs_app_rtc_network.cpp:811][errno=0] thread [1][j498ff67]: interrupt() [./src/app/srs_app_st.cpp:257][errno=0] [2023-11-06 13:10:57.366][INFO][1][p3792my3] RTC: clear zombies=1 resources, conns=33, removing=0, unsubs=9 [2023-11-06 13:10:57.366][INFO][1][j498ff67] RTC: disposing #0 resource(RtcConn)(0x61d000010e80), conns=33, disposing=1, zombies=0 [2023-11-06 13:10:57.366][INFO][1][2l979959] TCP: clear zombies=1 resources, conns=33, removing=0, unsubs=0 [2023-11-06 13:10:57.366][INFO][1][7f099h12] TCP: disposing #0 resource(Tcp)(0x60c000022600), conns=33, disposing=1, zombies=0 [2023-11-06 13:10:57.367][WARN][1][j498ff67][4][DTLS_HANG] DTLS: Hang, done=0, version=-1, arq=0 [2023-11-06 13:10:57.371][INFO][1][8864b01p] DTLS: After done, got 39 bytes [2023-11-06 13:10:57.371][INFO][1][8864b01p] DTLS: State Passive RECV, done=1, arq=1, r0=39, len=39, cnt=21, size=26, hs=0 [2023-11-06 13:10:57.371][WARN][1][8864b01p][0] DTLS: SSL3 alert method=read type=warning, desc=CN(close notify), where=16388, ret=256, r1=0 [2023-11-06 13:10:57.371][INFO][1][8864b01p] RTC: session destroy by DTLS alert(warning CN), username=z86v9855:rwLw [2023-11-06 13:10:57.371][INFO][1][8864b01p] RTC: before dispose resource(RtcConn)(0x61d000038e80), conns=32, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.371][INFO][1][8864b01p] RTC: session detach from 8864b01p, disposing=1 [2023-11-06 13:10:57.371][INFO][1][8864b01p] RTC: tcp conn diposing, because of rtc connection [2023-11-06 13:10:57.371][INFO][1][7f099h12] TCP: before dispose resource(Tcp)(0x60c00003d780), conns=32, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.371][ERROR][1][8864b01p][0] serve error code=1070(StThreadInterrupt)(ST thread is interrupted) : rtc tcp conn : interrupted thread [1][8864b01p]: do_cycle() [./src/app/srs_app_rtc_network.cpp:811][errno=0] thread [1][8864b01p]: interrupt() [./src/app/srs_app_st.cpp:257][errno=0] [2023-11-06 13:10:57.371][INFO][1][p3792my3] RTC: clear zombies=1 resources, conns=32, removing=0, unsubs=2 [2023-11-06 13:10:57.371][INFO][1][8864b01p] RTC: disposing #0 resource(RtcConn)(0x61d000038e80), conns=32, disposing=1, zombies=0 [2023-11-06 13:10:57.371][INFO][1][2l979959] TCP: clear zombies=1 resources, conns=32, removing=0, unsubs=0 [2023-11-06 13:10:57.371][INFO][1][7f099h12] TCP: disposing #0 resource(Tcp)(0x60c00003d780), conns=32, disposing=1, zombies=0 [2023-11-06 13:10:57.373][WARN][1][8864b01p][4][DTLS_HANG] DTLS: Hang, done=0, version=-1, arq=0 [2023-11-06 13:10:57.376][INFO][1][31913446] DTLS: After done, got 39 bytes [2023-11-06 13:10:57.376][INFO][1][31913446] DTLS: State Passive RECV, done=1, arq=1, r0=39, len=39, cnt=21, size=26, hs=0 [2023-11-06 13:10:57.376][WARN][1][31913446][0] DTLS: SSL3 alert method=read type=warning, desc=CN(close notify), where=16388, ret=256, r1=0 [2023-11-06 13:10:57.376][INFO][1][31913446] RTC: session destroy by DTLS alert(warning CN), username=vq652o10:miUl [2023-11-06 13:10:57.376][INFO][1][31913446] RTC: before dispose resource(RtcConn)(0x61d000077680), conns=31, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.376][INFO][1][31913446] RTC: session detach from 31913446, disposing=1 [2023-11-06 13:10:57.376][INFO][1][31913446] RTC: tcp conn diposing, because of rtc connection [2023-11-06 13:10:57.376][INFO][1][7f099h12] TCP: before dispose resource(Tcp)(0x60c0000a8580), conns=31, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.376][ERROR][1][31913446][0] serve error code=1070(StThreadInterrupt)(ST thread is interrupted) : rtc tcp conn : interrupted thread [1][31913446]: do_cycle() [./src/app/srs_app_rtc_network.cpp:811][errno=0] thread [1][31913446]: interrupt() [./src/app/srs_app_st.cpp:257][errno=0] [2023-11-06 13:10:57.376][INFO][1][p3792my3] RTC: clear zombies=1 resources, conns=31, removing=0, unsubs=2 [2023-11-06 13:10:57.376][INFO][1][31913446] RTC: disposing #0 resource(RtcConn)(0x61d000077680), conns=31, disposing=1, zombies=0 [2023-11-06 13:10:57.376][INFO][1][2l979959] TCP: clear zombies=1 resources, conns=31, removing=0, unsubs=0 [2023-11-06 13:10:57.376][INFO][1][7f099h12] TCP: disposing #0 resource(Tcp)(0x60c0000a8580), conns=31, disposing=1, zombies=0 [2023-11-06 13:10:57.377][WARN][1][31913446][4][DTLS_HANG] DTLS: Hang, done=0, version=-1, arq=0 [2023-11-06 13:10:57.390][INFO][1][1652x2oq] DTLS: After done, got 39 bytes [2023-11-06 13:10:57.390][INFO][1][1652x2oq] DTLS: State Passive RECV, done=1, arq=1, r0=39, len=39, cnt=21, size=26, hs=0 [2023-11-06 13:10:57.390][WARN][1][1652x2oq][0] DTLS: SSL3 alert method=read type=warning, desc=CN(close notify), where=16388, ret=256, r1=0 [2023-11-06 13:10:57.390][INFO][1][1652x2oq] RTC: session destroy by DTLS alert(warning CN), username=539xdlx8:0oGi [2023-11-06 13:10:57.390][INFO][1][1652x2oq] RTC: before dispose resource(RtcConn)(0x61d0000adc80), conns=30, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.390][INFO][1][1652x2oq] RTC: session detach from 1652x2oq, disposing=1 [2023-11-06 13:10:57.390][INFO][1][1652x2oq] RTC: tcp conn diposing, because of rtc connection [2023-11-06 13:10:57.390][INFO][1][7f099h12] TCP: before dispose resource(Tcp)(0x60c0000f1300), conns=30, zombies=0, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.390][ERROR][1][1652x2oq][0] serve error code=1070(StThreadInterrupt)(ST thread is interrupted) : rtc tcp conn : interrupted thread [1][1652x2oq]: do_cycle() [./src/app/srs_app_rtc_network.cpp:811][errno=0] thread [1][1652x2oq]: interrupt() [./src/app/srs_app_st.cpp:257][errno=0] [2023-11-06 13:10:57.390][INFO][1][iha8sd52] DTLS: After done, got 39 bytes [2023-11-06 13:10:57.390][INFO][1][iha8sd52] DTLS: State Passive RECV, done=1, arq=1, r0=39, len=39, cnt=21, size=26, hs=0 [2023-11-06 13:10:57.390][WARN][1][iha8sd52][0] DTLS: SSL3 alert method=read type=warning, desc=CN(close notify), where=16388, ret=256, r1=0 [2023-11-06 13:10:57.390][INFO][1][iha8sd52] RTC: session destroy by DTLS alert(warning CN), username=039m6t1a:P189 [2023-11-06 13:10:57.390][INFO][1][iha8sd52] RTC: before dispose resource(RtcConn)(0x61d0000ec480), conns=30, zombies=1, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.390][INFO][1][iha8sd52] RTC: session detach from iha8sd52, disposing=1 [2023-11-06 13:10:57.390][INFO][1][iha8sd52] RTC: tcp conn diposing, because of rtc connection [2023-11-06 13:10:57.390][INFO][1][7f099h12] TCP: before dispose resource(Tcp)(0x60c000139000), conns=30, zombies=1, ign=0, inz=0, ind=0 [2023-11-06 13:10:57.390][ERROR][1][iha8sd52][0] serve error code=1070(StThreadInterrupt)(ST thread is interrupted) : rtc tcp conn : interrupted thread [1][iha8sd52]: do_cycle() [./src/app/srs_app_rtc_network.cpp:811][errno=0] thread [1][iha8sd52]: interrupt() [./src/app/srs_app_st.cpp:257][errno=0] [2023-11-06 13:10:57.391][INFO][1][p3792my3] RTC: clear zombies=2 resources, conns=30, removing=0, unsubs=2 [2023-11-06 13:10:57.391][INFO][1][1652x2oq] RTC: disposing #0 resource(RtcConn)(0x61d0000adc80), conns=30, disposing=2, zombies=0 [2023-11-06 13:10:57.391][INFO][1][iha8sd52] RTC: disposing #1 resource(RtcConn)(0x61d0000ec480), conns=29, disposing=2, zombies=0 [2023-11-06 13:10:57.391][INFO][1][2l979959] TCP: clear zombies=2 resources, conns=30, removing=0, unsubs=0 [2023-11-06 13:10:57.391][INFO][1][7f099h12] TCP: disposing #0 resource(Tcp)(0x60c0000f1300), conns=30, disposing=2, zombies=0 [2023-11-06 13:10:57.391][INFO][1][7f099h12] TCP: disposing #1 resource(Tcp)(0x60c000139000), conns=29, disposing=2, zombies=0

sandro-qiang commented 8 months ago

Base on the srs source, I suggest check sendonly_skt_ != NULL in SrsRtcTcpNetwork::write and do SrsRtcTcpNetwork::update_sendonly_socket(NULL) in SrsRtcTcpConn::on_before_dispose.

sandro-qiang commented 8 months ago

another crash

[2023-11-06 14:13:14.550][INFO][1][hi69qz68] RTC: disposing #0 resource(RtcConn)(0x61d000002880), conns=49, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][098og308] RTC: disposing #1 resource(RtcConn)(0x61d0000f0080), conns=48, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][aui07865] RTC: disposing #2 resource(RtcConn)(0x61d0000dc080), conns=47, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][g2x0l56t] RTC: disposing #3 resource(RtcConn)(0x61d00010d680), conns=46, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: before dispose resource(Tcp)(0x60c000159ac0), conns=49, zombies=1, ign=0, inz=0, ind=0
[2023-11-06 14:13:14.550][WARN][1][098og308][4] client disconnect peer. ret=1007
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: before dispose resource(Tcp)(0x60c0000fdfc0), conns=49, zombies=2, ign=0, inz=0, ind=0
[2023-11-06 14:13:14.550][WARN][1][aui07865][4] client disconnect peer. ret=1007
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: before dispose resource(Tcp)(0x60c00016b4c0), conns=49, zombies=3, ign=0, inz=0, ind=0
[2023-11-06 14:13:14.550][WARN][1][g2x0l56t][4] client disconnect peer. ret=1007
[2023-11-06 14:13:14.550][INFO][1][2480z443] TCP: clear zombies=4 resources, conns=49, removing=0, unsubs=0
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: disposing #0 resource(Tcp)(0x60c00013d440), conns=49, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: disposing #1 resource(Tcp)(0x60c000159ac0), conns=48, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: disposing #2 resource(Tcp)(0x60c0000fdfc0), conns=47, disposing=4, zombies=0
[2023-11-06 14:13:14.550][INFO][1][k0e5707f] TCP: disposing #3 resource(Tcp)(0x60c00016b4c0), conns=46, disposing=4, zombies=0
[2023-11-06 14:13:14.583][ERROR][1][k0e5707f][0] backtrace 15 frames of ./objs/srs SRS/6.0.59(Bee)
sh: 1: addr2line: not found
[2023-11-06 14:13:14.585][ERROR][1][k0e5707f][0] #0 0x430000 0 ./objs/srs(+0x430000) [0x5617dc8e1000]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.586][ERROR][1][k0e5707f][0] #1 0x516cd0 0 ./objs/srs(_ZN12SrsCplxError10srs_assertEb+0xd4) [0x5617dc9c7cd0]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.587][ERROR][1][k0e5707f][0] #2 0x71bba0 0 ./objs/srs(_Z14srs_close_stfdRPv+0xb4) [0x5617dcbccba0]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.588][ERROR][1][k0e5707f][0] #3 0x7715e6 0 ./objs/srs(_ZN16SrsTcpConnectionD1Ev+0x19c) [0x5617dcc225e6]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.589][ERROR][1][k0e5707f][0] #4 0x771656 0 ./objs/srs(_ZN16SrsTcpConnectionD0Ev+0x1c) [0x5617dcc22656]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.590][ERROR][1][k0e5707f][0] #5 0xb26e46 0 ./objs/srs(_ZN13SrsRtcTcpConnD1Ev+0x3fa) [0x5617dcfd7e46]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.591][ERROR][1][k0e5707f][0] #6 0xb26f0a 0 ./objs/srs(_ZN13SrsRtcTcpConnD0Ev+0x1c) [0x5617dcfd7f0a]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.592][ERROR][1][k0e5707f][0] #7 0x76fa25 0 ./objs/srs(_ZN18SrsResourceManager8do_clearEv+0x66d) [0x5617dcc20a25]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.593][ERROR][1][k0e5707f][0] #8 0x76f2bc 0 ./objs/srs(_ZN18SrsResourceManager5clearEv+0x498) [0x5617dcc202bc]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.594][ERROR][1][k0e5707f][0] #9 0x76bfa8 0 ./objs/srs(_ZN18SrsResourceManager5cycleEv+0x1d2) [0x5617dcc1cfa8]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.595][ERROR][1][k0e5707f][0] #10 0x81445a 0 ./objs/srs(_ZN16SrsFastCoroutine5cycleEv+0x348) [0x5617dccc545a]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.596][ERROR][1][k0e5707f][0] #11 0x8145aa 0 ./objs/srs(_ZN16SrsFastCoroutine3pfnEPv+0x24) [0x5617dccc55aa]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.597][ERROR][1][k0e5707f][0] #12 0xbd0521 0 ./objs/srs(_st_thread_main+0x2e) [0x5617dd081521]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.598][ERROR][1][k0e5707f][0] #13 0xbd0e47 0 ./objs/srs(st_thread_create+0x132) [0x5617dd081e47]
sh: 1: addr2line: not found
[2023-11-06 14:13:14.599][ERROR][1][k0e5707f][0] #14 (nil) 0 [0x7fb9fc466220]
==1==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fb9fc466d40; bottom 0x7fb9fcc94000; size: 0xffffffffff7d2d40 (-8573632)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
srs: ./src/kernel/srs_kernel_error.cpp:446: static void SrsCplxError::srs_assert(bool): Assertion `expression' failed.
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fb9ffd88941 bp 0x7fb9fff1e588 sp 0x7fb9fcc95b80 T1)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
    #0 0x7fb9ffd88940 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22940)
    #1 0x7fb9ffd88728  (/lib/x86_64-linux-gnu/libc.so.6+0x22728)
    #2 0x7fb9ffd99fd5 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x33fd5)
    #3 0x5617dc9c8051 in SrsCplxError::srs_assert(bool) src/kernel/srs_kernel_error.cpp:446
    #4 0x5617dcbccb9f in srs_close_stfd(void*&) src/protocol/srs_protocol_st.cpp:91
    #5 0x5617dcc225e5 in SrsTcpConnection::~SrsTcpConnection() src/app/srs_app_conn.cpp:455
    #6 0x5617dcc22655 in SrsTcpConnection::~SrsTcpConnection() src/app/srs_app_conn.cpp:456
    #7 0x5617dcfd7e45 in SrsRtcTcpConn::~SrsRtcTcpConn() src/app/srs_app_rtc_network.cpp:719
    #8 0x5617dcfd7f09 in SrsRtcTcpConn::~SrsRtcTcpConn() src/app/srs_app_rtc_network.cpp:720
    #9 0x5617dcc20a24 in SrsResourceManager::do_clear() src/app/srs_app_conn.cpp:351
    #10 0x5617dcc202bb in SrsResourceManager::clear() src/app/srs_app_conn.cpp:317
    #11 0x5617dcc1cfa7 in SrsResourceManager::cycle() src/app/srs_app_conn.cpp:110
    #12 0x5617dccc5459 in SrsFastCoroutine::cycle() src/app/srs_app_st.cpp:285
    #13 0x5617dccc55a9 in SrsFastCoroutine::pfn(void*) src/app/srs_app_st.cpp:300
    #14 0x5617dd081520 in _st_thread_main /srs/trunk/objs/Platform-SRS6-Linux-5.15.0-GCC9.4.0-x86_64/st-srs/sched.c:380
    #15 0x5617dd081e46 in st_thread_create /srs/trunk/objs/Platform-SRS6-Linux-5.15.0-GCC9.4.0-x86_64/st-srs/sched.c:666
    #16 0x7fb9fc46621f  (<unknown module>)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x22940) in abort
Thread T1 (srs-hybrid-2) created by T0 here:
    #0 0x5617dc8aead5 in __interceptor_pthread_create (/usr/local/srs/objs/srs+0x3fdad5)
    #1 0x5617dcf07df3 in SrsThreadPool::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, SrsCplxError* (*)(void*), void*) src/app/srs_app_threads.cpp:676
    #2 0x5617dd07fe55 in run_in_thread_pool() src/main/srs_main_server.cpp:517
    #3 0x5617dd07f8bd in run_directly_or_daemon() src/main/srs_main_server.cpp:456
    #4 0x5617dd07c98e in do_main(int, char**, char**) src/main/srs_main_server.cpp:245
    #5 0x5617dd07cca9 in main src/main/srs_main_server.cpp:256
    #6 0x7fb9ffd8a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)

[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] ==1==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fb9fc466d40; bottom 0x7fb9fcc94000; size: 0xffffffffff7d2d40 (-8573632)
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] False positive error reports may follow
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] For details see https://github.com/google/sanitizers/issues/189
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] =================================================================
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] ==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fb9ffd88941 bp 0x7fb9fff1e588 sp 0x7fb9fcc95b80 T1)
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] ==1==The signal is caused by a READ memory access.
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0] ==1==Hint: address points to the zero page.
sh: 1: addr2line: not found
[2023-11-06 14:13:14.717][ERROR][1][k0e5707f][0]     #0 0x7fb9ffd88940 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22940), r0=1094
sh: 1: addr2line: not found
[2023-11-06 14:13:14.718][ERROR][1][k0e5707f][0]     #1 0x7fb9ffd88728  (/lib/x86_64-linux-gnu/libc.so.6+0x22728), r0=1094
sh: 1: addr2line: not found
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #2 0x7fb9ffd99fd5 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x33fd5), r0=1094
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #3 0x5617dc9c8051 in SrsCplxError::srs_assert(bool) src/kernel/srs_kernel_error.cpp:446, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #4 0x5617dcbccb9f in srs_close_stfd(void*&) src/protocol/srs_protocol_st.cpp:91, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #5 0x5617dcc225e5 in SrsTcpConnection::~SrsTcpConnection() src/app/srs_app_conn.cpp:455, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #6 0x5617dcc22655 in SrsTcpConnection::~SrsTcpConnection() src/app/srs_app_conn.cpp:456, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #7 0x5617dcfd7e45 in SrsRtcTcpConn::~SrsRtcTcpConn() src/app/srs_app_rtc_network.cpp:719, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #8 0x5617dcfd7f09 in SrsRtcTcpConn::~SrsRtcTcpConn() src/app/srs_app_rtc_network.cpp:720, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #9 0x5617dcc20a24 in SrsResourceManager::do_clear() src/app/srs_app_conn.cpp:351, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #10 0x5617dcc202bb in SrsResourceManager::clear() src/app/srs_app_conn.cpp:317, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #11 0x5617dcc1cfa7 in SrsResourceManager::cycle() src/app/srs_app_conn.cpp:110, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #12 0x5617dccc5459 in SrsFastCoroutine::cycle() src/app/srs_app_st.cpp:285, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #13 0x5617dccc55a9 in SrsFastCoroutine::pfn(void*) src/app/srs_app_st.cpp:300, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #14 0x5617dd081520 in _st_thread_main /srs/trunk/objs/Platform-SRS6-Linux-5.15.0-GCC9.4.0-x86_64/st-srs/sched.c:380, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #15 0x5617dd081e46 in st_thread_create /srs/trunk/objs/Platform-SRS6-Linux-5.15.0-GCC9.4.0-x86_64/st-srs/sched.c:666, r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0]     #16 0x7fb9fc46621f  (<unknown module>), r0=1093
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0] AddressSanitizer can not provide additional info.
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0] SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x22940) in abort
[2023-11-06 14:13:14.719][ERROR][1][k0e5707f][0] Thread T1 (srs-hybrid-2) created by T0 here:
sh: 1: addr2line: not found
[2023-11-06 14:13:14.720][ERROR][1][k0e5707f][0]     #0 0x5617dc8aead5 in __interceptor_pthread_create (/usr/local/srs/objs/srs+0x3fdad5), r0=1094
[2023-11-06 14:13:14.720][ERROR][1][k0e5707f][0]     #1 0x5617dcf07df3 in SrsThreadPool::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, SrsCplxError* (*)(void*), void*) src/app/srs_app_threads.cpp:676, r0=1093
[2023-11-06 14:13:14.720][ERROR][1][k0e5707f][0]     #2 0x5617dd07fe55 in run_in_thread_pool() src/main/srs_main_server.cpp:517, r0=1093
[2023-11-06 14:13:14.720][ERROR][1][k0e5707f][0]     #3 0x5617dd07f8bd in run_directly_or_daemon() src/main/srs_main_server.cpp:456, r0=1093
[2023-11-06 14:13:14.720][ERROR][1][k0e5707f][0]     #4 0x5617dd07c98e in do_main(int, char**, char**) src/main/srs_main_server.cpp:245, r0=1093
[2023-11-06 14:13:14.720][ERROR][1][k0e5707f][0]     #5 0x5617dd07cca9 in main src/main/srs_main_server.cpp:256, r0=1093
sh: 1: addr2line: not found
[2023-11-06 14:13:14.721][ERROR][1][k0e5707f][0]     #6 0x7fb9ffd8a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082), r0=1094
==1==ABORTING

so, I think maybe make skt_ reference counting is a better idea, shared among multiple coroutines.

winlinvip commented 8 months ago

This issue seems to occur because when closing the TCP connection, there are still coroutines reading and writing data, causing the srs_close_stfd assert to fail.

Although reference counting or smart pointers can solve the problem, the best solution is to wait until the TCP connection is closed and there are no reading or writing coroutines before releasing the object.

In other words, there is no need for smart pointers here; instead, the object release process should be improved.

sandro-qiang commented 6 months ago

more crash

[2023-12-21 02:33:06.636][WARN][1][l74f1r35][11] RTC: Drop for ssrc 3670442919 not found
[2023-12-21 02:33:06.636][WARN][1][eg12jp3i][11] RTC: Drop for ssrc 3670442919 not found
[2023-12-21 02:33:06.637][WARN][1][70q44376][11] RTC: Drop for ssrc 836709604 not found
[2023-12-21 02:33:06.647][INFO][1][e2619wte] Process: cpu=5.00%,573MB, threads=2
[2023-12-21 02:33:06.653][WARN][1][w8e6hf20][11] RTC: Drop for ssrc 1093448050 not found
[2023-12-21 02:33:06.653][WARN][1][70q44376][11] RTC: Drop for ssrc 836709604 not found
[2023-12-21 02:33:06.659][WARN][1][l74f1r35][11] RTC: Drop for ssrc 3670442919 not found
[2023-12-21 02:33:06.659][WARN][1][eg12jp3i][11] RTC: Drop for ssrc 3670442919 not found
=================================================================
==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0001a4b48 at pc 0x55c6ac1f1ed2 bp 0x7f1f764aa1a0 sp 0x7f1f764aa190
READ of size 8 at 0x60e0001a4b48 thread T1 (srs-hybrid-2)
    #0 0x55c6ac1f1ed1 in SrsRtcRecvTrack::send_rtcp_rr() src/app/srs_app_rtc_source.cpp:2411
    #1 0x55c6ac133198 in SrsRtcPublishStream::send_rtcp_rr() src/app/srs_app_rtc_conn.cpp:1284
    #2 0x55c6ac12da4b in SrsRtcPublishRtcpTimer::on_timer(long) src/app/srs_app_rtc_conn.cpp:953
    #3 0x55c6ac0af2c0 in SrsFastTimer::cycle() src/app/srs_app_hourglass.cpp:187
    #4 0x55c6abeabb35 in SrsFastCoroutine::cycle() src/app/srs_app_st.cpp:285
    #5 0x55c6abeabc85 in SrsFastCoroutine::pfn(void*) src/app/srs_app_st.cpp:300
    #6 0x55c6ac25fd70 in _st_thread_main /srs/trunk/objs/Platform-SRS5-Linux-4.4.0-GCC9.4.0-x86_64/st-srs/sched.c:380
    #7 0x55c6ac260696 in st_thread_create /srs/trunk/objs/Platform-SRS5-Linux-4.4.0-GCC9.4.0-x86_64/st-srs/sched.c:666
    #8 0x602000001e0f  (<unknown module>)

0x60e0001a4b48 is located 136 bytes inside of 152-byte region [0x60e0001a4ac0,0x60e0001a4b58)
freed by thread T1 (srs-hybrid-2) here:
    #0 0x55c6abb8fc1f in operator delete(void*) (/usr/local/srs/objs/srs+0x4bbc1f)

previously allocated by thread T1 (srs-hybrid-2) here:
    #0 0x55c6abb8ec87 in operator new(unsigned long) (/usr/local/srs/objs/srs+0x4bac87)

Thread T1 (srs-hybrid-2) created by T0 here:
    #0 0x55c6abab9f45 in pthread_create (/usr/local/srs/objs/srs+0x3e5f45)
    #1 0x55c6ac0ea3f3 in SrsThreadPool::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, SrsCplxError* (*)(void*), void*) src/app/srs_app_threads.cpp:679
    #2 0x55c6ac25e6a5 in run_in_thread_pool() src/main/srs_main_server.cpp:475
    #3 0x55c6ac25e10d in run_directly_or_daemon() src/main/srs_main_server.cpp:414
    #4 0x55c6ac25ba2f in do_main(int, char**, char**) src/main/srs_main_server.cpp:242
    #5 0x55c6ac25bd4a in main src/main/srs_main_server.cpp:253
    #6 0x7f1f7a4aa0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

SUMMARY: AddressSanitizer: heap-use-after-free src/app/srs_app_rtc_source.cpp:2411 in SrsRtcRecvTrack::send_rtcp_rr()
Shadow bytes around the buggy address:
  0x0c1c8002c910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c8002c920: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1c8002c930: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1c8002c940: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1c8002c950: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c1c8002c960: fd fd fd fd fd fd fd fd fd[fd]fd fa fa fa fa fa
  0x0c1c8002c970: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1c8002c980: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x0c1c8002c990: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1c8002c9a0: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1c8002c9b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc

SrsRtcConnection has been freed, but it's still being used in timer.

winlinvip commented 2 months ago

Dup to https://github.com/ossrs/srs/issues/3784#issuecomment-2028500280