Clarify "multiple sources of media stitched together" at Note describing replaceTrack

guest271314 commented 5 years ago

1) Am consider composing a PR for another W3C repository (MediaRecorder) which will prospectively use language contained in this W3C repository; the description of replaceTrack() https://w3c.github.io/webrtc-pc/#dom-rtcrtpsender-replacetrack, particularly

6.4.3 If sending is true, and withTrack is not null, have the sender switch seamlessly to transmitting withTrack instead of the sender's existing track.

2) When reading content in this specification relating to replaceTrack() there is a note at https://w3c.github.io/webrtc-pc/#dom-rtcrtpsender-replacetrack#issue-container-generatedID-29

NOTE

There is not an exact 1:1 correspondence between tracks sent by one RTCPeerConnection and received by the other. For one, IDs of tracks sent have no mapping to the IDs of tracks received. Also, replaceTrack changes the track sent by an RTCRtpSender without creating a new track on the receiver side; the corresponding RTCRtpReceiver will only have a single track, potentially representing multiple sources of media stitched together. Both addTransceiver and replaceTrack can be used to cause the same track to be sent multiple times, which will be observed on the receiver side as multiple receivers each with its own separate track. Thus it's more accurate to think of a 1:1 relationship between an RTCRtpSender on one side and an RTCRtpReceiver's track on the other side, matching senders and receivers using the RTCRtpTransceiver's mid if necessary. (emphasis added)

where the term "multiple sources of media stitched together" is used. The specification does not clarify that procedure. Is using some of the above language (read https://www.w3.org/Consortium/Legal/2015/doc-license) in this specification in a PR for another W3C repository/specification ok; if so, does this specification need to be cited?

For the purpose of avoiding confusion as to precisely what the PR is describing can this specification clarify what "stitched together" technically means and if the specification defines the procedure to stitch together "multiple sources of media"; or, if the procedure for multiple sources of media stitched together is left to implementers?

fippo commented 5 years ago

"multiple sources of media stitched together" is probably describing the time behaviour, i.e. that RTCRtpSender is supposed to generate a continuous series of timestamps.

guest271314 commented 5 years ago

@fippo The code at Chromium source appears to use

std::unique_ptr<WebRtcMediaStreamTrackAdapterMap::AdapterRef> track_ref; at several occasions, particularly at https://github.com/chromium/chromium/blob/1dc524aa4005eb08c533f2a344d5812863ae60a5/content/renderer/media/webrtc/rtc_rtp_sender.cc#L192

    std::unique_ptr<WebRtcMediaStreamTrackAdapterMap::AdapterRef> track_ref;
    webrtc::MediaStreamTrackInterface* webrtc_track = nullptr;
    if (!with_track.IsNull()) {
      track_ref = track_map_->GetOrCreateLocalTrackAdapter(with_track);
      webrtc_track = track_ref->webrtc_track();
    }
    signaling_task_runner_->PostTask(
        FROM_HERE,
        base::BindOnce(
            &RTCRtpSender::RTCRtpSenderInternal::ReplaceTrackOnSignalingThread,
            this, std::move(track_ref), base::Unretained(webrtc_track),
            std::move(callback)));

is that the basis of the code which performs the task of "multiple sources of media stitched together" at Chromium?

If yes, then what exactly does the code do, technically? That is, what is the flowchart for that procedure outlined in the same manner that the procedure for replaceTrack() is outlined in the specification at the first link at first comment?

aboba commented 5 years ago

@fippo Yes, this is referring to generating a single RTP stream, with continuous timestamps and a single sequence number space.

jan-ivar commented 5 years ago

It's stitched together over time. The purpose of this note is more to dispel the 1:1 myth of tracks. I think the vague language here is fine to convey that. I don't think this note counts as citeable.

guest271314 commented 5 years ago

@jan-ivar

It's stitched together over time. The purpose of this note is more to dispel the 1:1 myth of tracks. I think the vague language here is fine to convey that. I don't think this note counts as citeable.

How to incorporate the functionality of RTCRtpSender.replaceTrack() in to a method of MediaStream and MediaRecorder without using RTCPeerConnection; e.g., if (currentVideo.currentTime > N) { await <mediaStreamInstance | mediaRecorderInstance>.replaceTrack(nextVideoStream.getVideoTracks()[0]) }?

I don't think this note counts as citeable.

Which other portions of the specification are not citeable? Can the specification be edited to clearly indicate which language in the specification is citeable and is not citeable?

aboba commented 5 years ago

@jan-ivar I believe the text in the note is actually incorrect. When receiving simulcast, the RtpReceiver does need to combine two distinct RTP streams into a single track. This is non-trivial because re-ordering can cause the streams to be co-mingled, so that the RtpReceiver will need to maintain two jitter buffers and figure out when to switch between streams so that it can feed material to the decoder in a coherent way. This is not required when sender.replaceTrack() is called - the RtpReceiver only sees a single RTP stream, so it doesn't actually need to do anything special. Saying that the RtpReceiver "stiches together" is wrong here - as wrong as saying that the RtpReceiver has to do anything when an SFU (which receives simulcast) switches between incoming RTP streams and provides the receiver with a single RTP stream.

aboba commented 5 years ago

Closing - this issue should be handled in MedaStream Recording.

w3c / webrtc-pc

Clarify "multiple sources of media stitched together" at Note describing replaceTrack #2171