ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
24.86k stars 5.29k forks source link

HLS generated an extremely large ts file, 100-200G+ #3796

Open yangkang2021 opened 10 months ago

yangkang2021 commented 10 months ago

Identified a suspected SRS issue:

  1. SRS's HLS slicing has generated an extremely large TS file, 100-200G
  2. In our cluster, this has happened twice this year
  3. Version: SRS/4.0.247(Leo)
  4. The TS files of the last day have not been saved, the extremely large TS file should have lasted for a long time.

TRANS_BY_GPT4

winlinvip commented 10 months ago

This issue is related to HLS's error strategy, which needs to be perfected. The current strategy involves a configuration:

        # the error strategy. can be:
        #       ignore, disable the hls.
        #       disconnect, require encoder republish.
        #       continue, ignore failed try to continue output hls.
        # Overwrite by env SRS_VHOST_HLS_HLS_ON_ERROR for all vhosts.
        # default: continue
        hls_on_error continue;

If an error is encountered, there will be an option to ignore it or interrupt the live broadcast:

    if ((err = hls->on_audio(msg, format)) != srs_success) {
        // apply the error strategy for hls.
        std::string hls_error_strategy = _srs_config->get_hls_on_error(req_->vhost);
        if (srs_config_hls_is_on_error_ignore(hls_error_strategy)) {
            srs_warn("hls: ignore audio error %s", srs_error_desc(err).c_str());
            hls->on_unpublish();
            srs_error_reset(err);
        } else if (srs_config_hls_is_on_error_continue(hls_error_strategy)) {
            if (srs_hls_can_continue(srs_error_code(err), source->meta->ash(), msg)) {
                srs_error_reset(err);
            } else {
                return srs_error_wrap(err, "hls: audio");
            }
        } else {
            return srs_error_wrap(err, "hls: audio");
        }
    }

There are a few areas that need improvement:

  1. The issue encountered here is the exception of the slice being too large, the size of the slice should be judged.
  2. ignore will turn off HLS, at this time it would be more appropriate to add a callback to inform the business system of the exception.
  3. Need to improve unit tests to verify and test that these strategies are effectively implemented.

TRANS_BY_GPT4