ossrs / srs

SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
25.88k stars 5.39k forks source link

Cygwin: Build with SRT is ok, but crash when running. #3251

Open winlinvip opened 2 years ago

winlinvip commented 2 years ago

Workaround: Now we disable SRT for cygwin by default as a workaround.

The stack is bellow

gdb: unknown target exception 0x20474343 at 0x7fff993c039c

Thread 1 "srs" received signal ?, Unknown signal.
0x00007fff993c039c in RaiseException () from /cygdrive/c/Windows/System32/KERNELBASE.dll
(gdb) bt
#0  0x00007fff993c039c in RaiseException () from /cygdrive/c/Windows/System32/KERNELBASE.dll
#1  0x00000003ffc3cca1 in cyggcc_s-seh-1!_Unwind_RaiseException () from /usr/bin/cyggcc_s-seh-1.dll
#2  0x00000003ff20819b in cygstdc++-6!.cxa_throw () from /usr/bin/cygstdc++-6.dll
#3  0x00000001008c86c7 in CUDTUnited::accept(int, sockaddr*, int*) ()
#4  0x00000001008d17c2 in CUDT::accept(int, sockaddr*, int*) ()
#5  0x0000000100491d4c in SrsSrtSocket::accept (this=0x8000ed540, client_srt_fd=0x6fffff0afcb4) at ./src/protocol/srs_protocol_srt.cpp:757
#6  0x0000000100580360 in SrsSrtListener::cycle (this=0x8000e77b0) at ./src/app/srs_app_srt_listener.cpp:87
#7  0x00000001004ca5a9 in SrsFastCoroutine::cycle (this=0x8000fd360) at ./src/app/srs_app_st.cpp:285
#8  0x00000001004ca638 in SrsFastCoroutine::pfn (arg=0x8000fd360) at ./src/app/srs_app_st.cpp:300
#9  0x00000001005c6573 in _st_thread_main () at sched.c:371
#10 0x00000001005c6ee8 in st_thread_create (start=0x1, arg=0x18017c85d <dlcalloc+109>, joinable=16, stk_size=-14592) at sched.c:657
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
winlinvip commented 8 months ago

Based on the work of @xiaozhihong, I took the time to examine this issue in detail:

  1. Multithreading itself is not a problem, as can be seen in the verification program #3989.
  2. C++ exceptions work on other platforms without issues, but there are problems with Cygwin, as referenced in the verification program #3989.
  3. Attempts to fix the ST stack by copying the entire stack were unsuccessful, as noted in PR #3987.
  4. For research on exception handling mechanisms such as SEH, DWARF, and SJLJ, refer to the branch win-st-seh.

To summarize, here are several conclusions:

  1. The fundamental issue is that ST does not support Windows' SEH exception handling mechanism and will not support it in the future due to the high complexity involved. It would require hacking the entire SEH exception mechanism of Windows, which would significantly reduce maintainability and stability.
  2. The only possible solution is to rewrite the SRT protocol without using C++ exception handling to achieve better portability. This custom-implemented protocol stack could be enabled specifically for the Cygwin platform.
  3. Since SRS has its own implementation of protocols, adding an implementation for the SRT protocol would be in line with its conventions. Supporting general Windows platform users is considered worthwhile.

See also:

  1. Mixing Win32 SEH with heap-allocated stack frames
  2. SEH setup for fibers with exception chain validation (SEHOP) active

TRANS_BY_GPT4

winlinvip commented 7 months ago

Coroutine with stack is not compatible with Windows SEH exception, which is introduced by libsrt. Therefore, I beieve if call libsrt APIs in main coroutine, we can bypass this issue. This is achived when SEH exceptions are created in the main coroutine, or threads forked by main coroutine. Actually, the main coroutine is the primordial main thread, whose stack is not created by us but by the OS. This stack should be compatible with the SEH exceptions.

A possible solution is to run libsrt on the primordial coroutine. Since the primordial coroutine is a normal stack without any modifications, it should support SEH, but this requires further research.