ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
24.74k stars 5.28k forks source link

FLV: Crash when switch between HTTP-FLV streams. #1941

Open freeman1974 opened 3 years ago

freeman1974 commented 3 years ago

Description Multiple frequent switches to access two http-flv streams using the same player, continuously switching between these two streams. After about 5 switches, the SRS process exits. Refer to the image below to view the Linux core dump file. To investigate the corresponding code, it should be:

void srs_close_stfd(srs_netfd_t& stfd)
{
    if (stfd) {
        // we must ensure the close is ok.
        int err = st_netfd_close((st_netfd_t)stfd);
        srs_assert(err != -1);      // The assertion triggered causing the process to exit.
        stfd = NULL;
    }
}

And the caller of this func is:

void SrsTcpClient::close()
{
    // Ignore when already closed.
    if (!io) {
        return;
    }

    srs_close_stfd(stfd);
}

It seems that it is caused by frequent occurrences of SrsTcpClient::close(). It is caused by continuously closing and opening the socket.

    if ((*_st_eventsys->fd_close)(fd->osfd) < 0)
        return -1;

This line of code is causing the error. Is it because a global variable _st_eventsys is used without locking it?

  1. SRS version: srs 4.0.39 #define SRS_VERSION4_REVISION 39
  2. The log of SRS is as follows: Please refer to the screenshot in the attachment. http://demo.fili58.com/media/bug/photo_2020-09-07_18-16-24.jpg

TRANS_BY_GPT3

freeman1974 commented 3 years ago

Add a sentence: If the two streams switch a little slower, there won't be this issue.

TRANS_BY_GPT3

RossWang commented 3 years ago

May I ask, when you play http-flv or dash, does the server have high CPU usage? It seems that it doesn't happen with SRS3.

TRANS_BY_GPT3

freeman1974 commented 3 years ago

I didn't pay attention to this issue. Do you have any quantitative data? Specifically, for srs3 vs srs4.

TRANS_BY_GPT3

RossWang commented 3 years ago

It seems like you don't have this problem So I checked and found that it was due to the low setting of mr_latency Thank you for your help

TRANS_BY_GPT3

freeman1974 commented 3 years ago

I made some modifications myself, and by limiting the streaming speed, this problem can be solved.

TRANS_BY_GPT3

winlinvip commented 3 years ago

How fast do you switch before encountering problems?

TRANS_BY_GPT3

freeman1974 commented 3 years ago

Within 1 second. Millisecond level.

Winlin notifications@github.com wrote on Tuesday, December 1, 2020 at 8:03 PM:

How fast do you switch before encountering problems?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ossrs/srs/issues/1941#issuecomment-736508934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5PD2BD3P7VT63NUES2KBTSSTLPPANCNFSM4Q55YV4Q .

TRANS_BY_GPT3

winlinvip commented 2 years ago

There is currently a lingering issue with this problem, and it has been going on for many years without knowing why. It would be great if we could find the reason.

TRANS_BY_GPT3

winlinvip commented 2 years ago

st_netfd_close is definitely closing the fd while it is being read or written by another coroutine.

So the key point is how to print out the coroutines that are accessing this fd, so that we can identify where the problem is.

Using assert is not a problem because if we don't exit at the problematic location, there will still be various issues later on, and they will be even more peculiar.

The relationship between threads and file descriptors (fd) in ST is many-to-many. A thread can read and write to multiple fds, and an fd can be read and written by multiple threads (e.g., one coroutine reading and another writing). Therefore, there is more complexity in the underlying logic. When closing an fd, it is necessary to ensure that all threads are no longer reading or writing to this fd.

TRANS_BY_GPT3

winlinvip commented 2 months ago

Similar one, see https://github.com/ossrs/srs/issues/3784#issuecomment-2028500280