w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
1.03k stars 139 forks source link

Specify options to get metadata and order of existing tracks in input and set track order of muxer and, or writer #11

Closed guest271314 closed 5 years ago

guest271314 commented 5 years ago

Background

Merging Matroska and WebM files requires at least

1) Metadata about the input (media stream or file), specifically, a) if both A (audio) and V (video) tracks exist in the input media; 2) The order of the A and V tracks in the file; could be AV or VA 3) If either A or V track do not exist in the input file, the A or V track must be created (A can output silence; e.g., V can output #FFFFFF or #000000 frames) for the purpose of merging N Matroska or WebM files into a single file.

See

The WebM file output by MediaRecorder implementations at Chromium and Chrome, Mozilla Firefox and Nightly can have arbitrary AV track order, in general, per Media Capture and Streams specification https://www.w3.org/TR/mediacapture-streams/

The tracks of a MediaStream are stored in a track set. The track set MUST contain the MediaStreamTrack objects that correspond to the tracks of the stream. The relative order of the tracks in the set is User Agent defined and the API will never put any requirements on the order.

Proposed solution

The Web Codecs specification should define a means to get input media track order and set output file track order.

pthatcherg commented 5 years ago

WebCodecs is (so far) only designed to work on MediaStreamTracks, not MediaStreams. You encode/decode the tracks separately and then can combine them into MediaStreams, if you like (or take them out of MediaStreams).

If one were to use WebCodecs as a replacement of MediaRecorder, they could choose how to containerize the tracks independent of how the browsers do it in MediaRecorder. In fact, such flexibility is one of the points of WebCodecs.

guest271314 commented 5 years ago

You encode/decode the tracks separately and then can combine them into MediaStreams

Track order is important when merging tracks. This API should consider providing a means to set track order for the purpose of controlling the entire project. Once N tracks are combined into a single MediaStream the order, per the Media Capture and Streams specification, is arbitrary. If you add N audio or video tracks to a MediaStream, as the specification is currently defined and implemented, for example, using addTrack(), the tracks could be in the order [A,A,A,V,V,V], or [A,V] or [V,A].

pthatcherg commented 5 years ago

Sorry, when I wrote "You encode/decode the tracks separately and then can combine them into MediaStreams", I should have written "You decode the tracks separately and then can combine them into MediaStreams". For encode, the MediaStreamTrack is the input, not the output.

The output is just a bunch of encoded frames. It's up the JS/wasm to combine those frames into a muxed container, and specify the order within that container when it does so.

guest271314 commented 5 years ago

Yes, that does not change what this issue is about. If MediaStream is involved in this specification and API then Media Capture and Streams specification is currently clear as to track order, track order is not mandated, as quoted at OP. That means to merge the MediaStreamTracks within a MediaStream the MediaStream needs to be parsed to get the arbitrary track order

Code to demonstrate the issue

for (let i = 0; i < 5; i++) {
  const pc1 = new RTCPeerConnection();
  const tx1 = pc1.addTransceiver("audio");
  const audioTrack = tx1.receiver.track;
  mediaStream.addTrack(audioTrack);
  const pc2 = new RTCPeerConnection();
  const tx2 = pc2.addTransceiver("video");
  const videoTrack = tx2.receiver.track;
  mediaStream.addTrack(videoTrack);
}
console.log(mediaStream.getTracks());

/*
0: MediaStreamTrack {kind: "audio"...}
1: MediaStreamTrack {kind: "audio"…}
2: MediaStreamTrack {kind: "audio"…}
3: MediaStreamTrack {kind: "audio"…}
4: MediaStreamTrack {kind: "audio"…}
5: MediaStreamTrack {kind: "video"…}
6: MediaStreamTrack {kind: "video"…}
7: MediaStreamTrack {kind: "video"…}
8: MediaStreamTrack {kind: "video"…}
9: MediaStreamTrack {kind: "video"…}
*/

In the above exmaple the developer could perhaps erroneously anticipate that audio tracks would always be ordered before video tracks in the MediaStream, though that is not the case. Particularly when there are only two tracks in the MediaStream; the order could be (arbitrarily) [A,V] or [V,A]. Therefore to perform post-production on a media file where the source of the media is a MediaStream, each individual file needs to be parsed to determine the track order, else errors will ensue when presuming any particular track order; e.g.,

    // https://www.reddit.com/r/mkvtoolnix/comments/cdi824/mkvmerge_prints_error_when_merging_webm_files/etwhugs/
    // 1. run `mkvmerge -J` file.webm on each input file
    // 2. determine which tracks go together
    // 3. create an appropriate --append-to argument from the algorithm in 2
    let filesMetadata = [];
    for (const fileName of fileNames) {
       await new Promise(resolve => {
         const getMetadata = ({body}) => {        
           port.onMessage.removeListener(getMetadata);
           filesMetadata.push(JSON.parse(body));
           resolve();
         };
         port.onMessage.addListener(getMetadata);
         port.postMessage({
           "message": "metadata",
           "body": `${cmd} ${metadata} ${fileName}`
         });
       });
    };
    // construct `--append-to` option for merging files where
    // tracks are not in consistent order; for example, WebM
    // files output by Chromium, Firefox MediaRecorder implementations 
    // Chromium => Opus: "id": 0, Firefox => Opus: "id": 1,
    // Chromium => VP8: "id": 1, Firefox => VP8: "id": 0 
    for (let i = 0; i < filesMetadata.length; i++) {
      const {tracks:currentTracks} = filesMetadata[i];
      const currentAudioTrack = getTrack(currentTracks, "audio").id;
      const currentVideoTrack = getTrack(currentTracks, "video").id;
      if (filesMetadata[i + 1]) {
        const {tracks:nextTracks} = filesMetadata[i + 1];
        const nextAudioTrack = getTrack(nextTracks, "audio").id;
        const nextVideoTrack = getTrack(nextTracks, "video").id;
        appendTo += `${i+1}:${nextAudioTrack}:${i}:${currentAudioTrack},${i+1}:${nextVideoTrack}:${i}:${currentVideoTrack},`;
      } 
      else {
        const {tracks:previousTracks} = filesMetadata[i - 1];
        const previousAudioTrack = getTrack(previousTracks, "audio").id;
        const previousVideoTrack = getTrack(previousTracks, "video").id;
        appendTo += `${i}:${currentAudioTrack}:${i-1}:${previousAudioTrack},${i}:${currentVideoTrack}:${i-1}:${previousVideoTrack}`;
      }
    };   
    // check if tracks are ordered AV,AV...AV or arbitrarily AV,VA,AV,AV,VA...AV
    const orderedTracks = filesMetadata.map(({tracks}) => tracks).every(([{type}]) => type === "audio");
    console.log(JSON.stringify({filesMetadata, orderedTracks, appendTo}, null, 2));
    port.onMessage.addListener(onNativeMessage);
    const message = {
      "message": "write",
      // if tracks in files are not ordered use `--append-to` option, else do not
      "body": `${cmd} ${options} ${outputFileName} ${!orderedTracks ? appendTo : ""} '[' ${fileNames.join(" ")} ']'`
    };

at https://github.com/guest271314/native-messaging-mkvmerge/blob/master/app/native-messaging-mkvmerge.js#L186 through https://github.com/guest271314/native-messaging-mkvmerge/blob/master/app/native-messaging-mkvmerge.js#L235.

pthatcherg commented 5 years ago

But this specification doesn't deal with MediaStreams. It only deals with MediaStreamTracks. And the only thing it does with those is to/from ReadableStreams/WritableStreams.

If you don't like the behavior of MediaStreams, don't use them. There are very few reasons why anyone needs to use a MediaStream vs. MediaStreamTrack.

guest271314 commented 5 years ago

There are very few reasons why anyone needs to use a MediaStream vs. MediaStreamTrack.

How do you propose to record multiple audio and video MediaStreamTrackss to a single media file without using MediaStream?

pthatcherg commented 5 years ago
  1. Encode the MediaStreamTracks using AudioEncoder/VideoEncoder (part of WebCodecs, done in browser).

  2. Write the encoded media (output of AudioEncoder/VideoEncoder) to a container format (done in JS/wasm, not part of WebCodecs).

guest271314 commented 5 years ago

Re 2. The Explainer currently includes code referencing a "muxer". If this API has nothing to do with containers then that should be removed, else it is misleading.

guest271314 commented 5 years ago

While 1. mentions MediaStreamTracks MediaStream is also used at the Explainer. MediaStream and MediaStreamTracks have several limitations, some mentioned above. Initially had the impression that this API would be self-sustaining without reliance on MediaStreamTrack or MediaStream. Does this API intend to define constraints for MediaStreamTrack specific to WebCodecs specification?

pthatcherg commented 5 years ago

Re 2. The Explainer currently includes code referencing a "muxer". If this API has nothing to do with containers then that should be removed, else it is misleading.

This API, when combined with JS/wasm code that does containerization can be used for various use cases, just like when combined with JS/wasm code that implements buffering can implement video streaming. The examples in the explainer are meant to explain this.

However, I can see that it a bit ambiguous that those parts are app-specific and not provided by the browser, so I have just added several comments to the explainer to make that more explicit and clear.

pthatcherg commented 5 years ago

While 1. mentions MediaStreamTracks MediaStream is also used at the Explainer. MediaStream and MediaStreamTracks have several limitations, some mentioned above. Initially had the impression that this API would be self-sustaining without reliance on MediaStreamTrack or MediaStream.

This API does not rely on MediaStreams or MediaStreamTracks. They can be used separately from them. The transcoding example in the explainer doesn't use them.

However, other web APIs, notably getUserMedia and HtmlMediaElement, work with MediaStreams, so in order to work with them one must go through MediaStreams. Perhaps that could one day change. For now, it's easier to define conversions between MediaStreamTracks and ReadableStream/WriteableStreams.

Does this API intend to define constraints for MediaStreamTrack specific to WebCodecs specification?

No.

guest271314 commented 5 years ago

To provide a concrete example of why filed this issue consider input media from arbitrary containers. Also consider the use case of merging specific fragments of those input media to a single file.

There is no way to determine the track order or if only an audio track or only a video track exists in the container without parsing the file.

In order to merge multiple tracks audio to audio, video to video, the next track needs to be appended to the current track.

For example, with an input list

https://upload.wikimedia.org/wikipedia/commons/6/6e/Micronesia_National_Anthem.ogg#t=0,2
https://upload.wikimedia.org/wikipedia/commons/a/a4/Xacti-AC8EX-Sample_video-001.ogv#t=0,4
https://mirrors.creativecommons.org/movingimages/webm/ScienceCommonsJesseDylan_240p.webm#t=10,20
https://mirrors.creativecommons.org/movingimages/webm/ASharedCulture_480p.webm#t=22,26
https://nickdesaulniers.github.io/netfix/demo/frag_bunny.mp4#t=55,60
https://raw.githubusercontent.com/w3c/web-platform-tests/master/media-source/mp4/test.mp4#t=0,5
https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerBlazes.mp4#t=0,5
https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerJoyrides.mp4#t=0,5
https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4#t=0,6

one or more of the media files could contain only an audio track or only a video track. To merge all of the tracks into a single file the corresponding audio or video track needs to be created.

When the WebM file is produced using MediaRecorder in the browser the track order is arbitrary, per current specification. If MediaRecorder is not used to create the file the track order could still be arbitrary, that is, for the above input media list

[
  {
    "audio": 1,
    "video": 0
  },
  {
    "audio": 0,
    "video": 1
  },
  {
    "audio": 1,
    "video": 0
  },
  {
    "audio": 0,
    "video": 1
  },
  {
    "audio": 0,
    "video": 1
  },
  {
    "audio": 0,
    "video": 1
  },
  {
    "audio": 0,
    "video": 1
  },
  {
    "audio": 0,
    "video": 1
  },
  {
    "audio": 0,
    "video": 1
  }
]

though is subject to change even between separate runs of the same code when using MediaRecorder to get WebM file representations of the various input files, in this case .ogg; .ogv; .webm; .mp4.

Whether the parsing takes place in WebCodecs realm or "JS/wasm" realm the track order needs to be known. That is, unless merging tracks into a single container is not a goal of this API, WebCodecs, though such goal was merged into this repository at https://github.com/pthatcherg/web-codecs/pull/3.

Whether the media file is produced using MediaRecorder or libwebm, libvpx, libmatroska, openh264, etc. the tracks could be in an arbitrary order in the container, meaning when attempting to merge the media fragments or specific time slices, without knowledge of the track order a developer could try to merge an audio track to a video track, which will result in an error.

For the above list of media files, and after creating the appropriate audio track or video track where none might exist in the original file, the --append-to mapping, in this case using mkvmerge to merge the list of media fragments is

1:0:0:1,1:1:0:0,2:1:1:0,2:0:1:1,3:0:2:1,3:1:2:0,4:0:3:0,4:1:3:1,5:0:4:0,5:1:4:1,6:0:5:0,6:1:5:1,7:0:6:0,7:1:6:1,8:0:7:0,8:1:7:1.

Example of the resulting file from such mapping https://github.com/guest271314/native-messaging-mkvmerge/blob/master/native-messaging-mkvmerge-vp8.webm?raw=true, plnkr https://plnkr.co/edit/8J61Rw?p=preview.

Whether MediaStream, MediaStreamTrack are used or not, since one non-goal of this API includes

Write the encoded media (output of AudioEncoder/VideoEncoder) to a container format (done in JS/wasm, not part of WebCodecs)

ignoring track order in WebCodecs realm merely farms out the task of parsing media container to determine track order, determine if only video or only audio track exists, to "JS/wasm" realm. Where for uniform application of WebCodecs, from perspective here, parsing media file and determining track order in the container should be a joint effort between WebCodecs "and Write the encoded media (output of AudioEncoder/VideoEncoder) to a container format (done in JS/wasm, not part of WebCodecs)" for consistency, uniformity and interoperability (for example the current state of the art is that WebM files produced by Chrome or Chromium cannot be merged with WebM files produced by Firefox or Nightly due to the vastly differing parameters used for the encoders, which will in fact crash the tab Why does Firefox produce larger WebM video files compared with Chrome?).

If "JS/wasm" is a black box absolutely outside of the scope of interoperability with WebCodecs, then "JS/wasm" is on their own re getting metatdata and track order.

"combined with JS/wasm code" language is prominently mentioned in this repository. Some consideration should be given to precisely how external code will interact with WebCodecs to avoid only addressing such an issue when a use case comes up later which requires determining track order input to a container from WebCodecs. 2C.

pthatcherg commented 5 years ago

I added an issue to discuss containers and a potential container API: https://github.com/WICG/web-codecs/issues/24