RTC2RTMP: No sound in the audio for the first six seconds

Describe the bug When the rtc_to_rtmp feature is enabled, we push the stream via WebRTC and pull the stream via RTMP, but there is no sound in the audio for the first six seconds

Version srs5.0

To Reproduce Steps to reproduce the behavior: 1.From a computer, initiate a WebRTC stream. 2.On a mobile device, pull the stream via RTMP. 3.The video appears first, followed by the audio, which starts playing after a six-second delay.

Expected behavior The video and audio should start simultaneously

Screenshots None

Additional context Add any other context about the problem here.

listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;

http_server {
    enabled         on;
    listen          8686;
    dir             ./objs/nginx/html;
}

http_api {
    enabled         on;
    listen          1985;
    auth {
        enabled on;
        username wp;
        password wp;
    }
}
stats {
    network         0;
}
rtc_server {
    enabled on;
    listen 8000; # UDP port
    # @see https://ossrs.net/lts/zh-cn/docs/v4/doc/webrtc#config-candidate
    candidate $CANDIDATE;
}
vhost __defaultVhost__ {
    rtc {
        enabled     off;
        # @see https://ossrs.net/lts/zh-cn/docs/v4/doc/webrtc#rtmp-to-rtc
        rtmp_to_rtc off;
        # @see https://ossrs.net/lts/zh-cn/docs/v4/doc/webrtc#rtc-to-rtmp
        rtc_to_rtmp off;
    }
}

vhost 192.168.10.13 {
    #dvr {
    #    enabled             on;
    #    dvr_path            ./objs/nginx/html/dvr/[app]/[stream]/[2006]/[01]/[02]/[15].[04].[05].[999].mp4;
    #    dvr_plan            session;
    #    dvr_wait_keyframe   on;
    #}
    rtc {
        enabled     on;
        # @see https://ossrs.net/lts/zh-cn/docs/v4/doc/webrtc#rtmp-to-rtc
        rtmp_to_rtc on;
        # @see https://ossrs.net/lts/zh-cn/docs/v4/doc/webrtc#rtc-to-rtmp
        rtc_to_rtmp on;

        pli_for_rtmp 0.5;
    }
    http_hooks {
        enabled         on;
        on_publish      http://172.24.144.1:8181/callback/srs/serve;
        on_unpublish    http://172.24.144.1:8181/callback/srs/serve;
        on_play         http://172.24.144.1:8181/callback/srs/serve;
        on_stop         http://172.24.144.1:8181/callback/srs/serve;
    }
}

When converting RTC to RTMP, there may be issues due to the differences in the two application scenarios.

For RTC, this might not be an issue because video conferencing naturally includes a waiting period and a process to confirm whether the audio can be heard.

For live streaming, it is generally not expected to see the content immediately after starting the stream, because live broadcasts typically begin ahead of time and include a confirmation that the live stream is functioning properly.

I believe this is an area for improvement, but the benefits of optimization, which in this case are the actual user experience improvements, are not very significant.

TRANS_BY_GPT4

When converting RTC to RTMP, there may be issues due to the differences in the two application scenarios.

For RTC, this might not be an issue because video conferencing naturally includes a waiting period and a process to confirm whether the audio can be heard.

For live streaming, it is generally not expected to see the content immediately after starting the stream, because live broadcasts typically begin ahead of time and include a confirmation that the live stream is functioning properly.

I believe this is an area for improvement, but the benefits of optimization, which in this case are the actual user experience improvements, are not very significant.

TRANS_BY_GPT4

It is indeed quite strange. Are there any temporary solutions? I have observed that with Tencent Cloud's audio module, the audio comes out very quickly.

When converting RTC to RTMP, there may be issues due to the differences in the two application scenarios.

For RTC, this might not be an issue because video conferencing naturally includes a waiting period and a process to confirm whether the audio can be heard.

For live streaming, it is generally not expected to see the content immediately after starting the stream, because live broadcasts typically begin ahead of time and include a confirmation that the live stream is functioning properly.

I believe this is an area for improvement, but the benefits of optimization, which in this case are the actual user experience improvements, are not very significant.

TRANS_BY_GPT4

I tried assigning the latest avsync_time from the video to the audio's avsync_time, and found that the audio could come out with the picture. However, it seems to cause a potential desynchronization in the first few seconds, but it's somewhat better than having no audio at all. It appears there might also be an issue with pure audio. Can I directly assign a value to it?


srs_error_t SrsRtmpFromRtcBridge::on_rtp(SrsRtpPacket *pkt)
{
    srs_error_t err = srs_success;

    if (!pkt->payload()) {
        return err;
    }
    // xxs:fix.start
    if (pkt->is_audio()) {
        last_audio_ts_ = pkt->get_avsync_time();
        if (last_audio_ts_ < 0 && last_video_ts_ > 0) {
            pkt->set_avsync_time(last_video_ts_);
        }
    } else {
        last_video_ts_ = pkt->get_avsync_time();
    }
    // xxs:fix.end
    // Have no received any sender report, can't calculate avsync_time,
    // discard it to avoid timestamp problem in live source
    const SrsRtpHeader& h = pkt->header;
    if (pkt->get_avsync_time() <= 0) {
        if (sync_state_ < 0) {
            srs_trace("RTC: Discard no-sync %s, ssrc=%u, seq=%u, ts=%u, state=%d", pkt->is_audio() ? "Audio" : "Video",
                h.get_ssrc(), h.get_sequence(), h.get_timestamp(), sync_state_);
            sync_state_ = 0;
        }
        return err;

void SrsRtcRecvTrack::update_send_report_time(const SrsNtp& ntp, uint32_t rtp_time)
{
    last_sender_report_ntp1_ = last_sender_report_ntp_;
    last_sender_report_rtp_time1_ = last_sender_report_rtp_time_;

    last_sender_report_ntp_ = ntp;
    last_sender_report_rtp_time_ = rtp_time;

    // TODO: FIXME: Use system wall clock.
    last_sender_report_sys_time_ = srs_update_system_time();

    if (last_sender_report_rtp_time1_ > 0) {
        // WebRTC using sender report to sync audio/video timestamp, because audio video have different timebase,
        // typical audio opus is 48000Hz, video is 90000Hz.
        // We using two sender report point to calculate avsync timestamp(clock time) with any given rtp timestamp.
        // For example, there are two history sender report of audio as below.
        //   sender_report1: rtp_time1 = 10000, ntp_time1 = 40000
        //   sender_report : rtp_time  = 10960, ntp_time  = 40020
        //   (rtp_time - rtp_time1) / (ntp_time - ntp_time1) = 960 / 20 = 48,
        // Now we can calcualte ntp time(ntp_x) of any given rtp timestamp(rtp_x),
        //   (rtp_x - rtp_time) / (ntp_x - ntp_time) = 48   =>   ntp_x = (rtp_x - rtp_time) / 48 + ntp_time;
        double sys_time_elapsed = static_cast<double>(last_sender_report_ntp_.system_ms_) - static_cast<double>(last_sender_report_ntp1_.system_ms_);

        // Check sys_time_elapsed is equal to zero.
        if (fpclassify(sys_time_elapsed) == FP_ZERO) {
            return;
        }

        double rtp_time_elpased = static_cast<double>(last_sender_report_rtp_time_) - static_cast<double>(last_sender_report_rtp_time1_);
        double rate = round(rtp_time_elpased / sys_time_elapsed);

        srs_trace("SrsRtcRecvTrack::update_send_report_time %d", rate);
        // TODO: FIXME: use the sample rate from sdp.
        if (rate > 0) {
            rate_ = rate;
        }
    } else {
        // xxs:fix:start
        if (SrsRtcAudioRecvTrack* audioTrack = dynamic_cast<SrsRtcAudioRecvTrack*>(this)) {
            rate_ = 48;
        } else if (SrsRtcVideoRecvTrack* videoTrack = dynamic_cast<SrsRtcVideoRecvTrack*>(this)) {
            rate_ = 90;
        }
        // xxs:fix:end
    }
}

The issue was eventually found here: the calculation of the audio rate has to wait until the second update_send_report_time call. As a temporary solution, I set the default value to 48.

It will be corrected during the second update_send_report_time.

ossrs / srs

RTC2RTMP: No sound in the audio for the first six seconds #4076