Feature request: Concat filter

guest271314 commented 5 years ago

Feature request: Include an option to MediaRecorder to concatenate (e.g., http://trac.ffmpeg.org/wiki/Concatenate) all input streams to a single webm file.

For example

Promise.all([Promise0, Promise1, Promise2])

when all PromiseN are fulfilled Promise.all() is fulfilled, even if Promise2 is resolved before Promise0.

Such code can be implemented in MediaRecorder as

let recorder = new MediaRecorder(new MediaStream([video0.captureStream(), video1.captureStream(), video2.captureStream()]), {concat:true, width:<"scaleToMinimumDetected|scaleToMaximumDetected|default:maximum">, height:<"scaleToMinimumDetected|scaleToMaximumDetected|default:maximum">});

recorder.ondataavailable = e => {
 // e.data single `.webm` file
}

Pehrsons commented 5 years ago

You could remux this in JS with fairly low overhead, no? Would that achieve the same thing?

guest271314 commented 5 years ago

@Pehrsons Do you mean re-record the recorded media fragments? There are several approaches that have considered and tried.

The concept is briefly outlined at https://github.com/w3c/mediacapture-main/issues/575.

https://github.com/guest271314/ts-ebml provides assistance "remux"ing recorded webm files. https://github.com/guest271314/whammy and https://github.com/guest271314/webm-writer-js provide a means to write images to a webm file created "on the fly", though without support for adding/including audio.

If there were a means to write the Matroska or Webm file directly using human-readable format https://github.com/Matroska-Org/matroska-specification/issues/129, https://github.com/Matroska-Org/ebml-specification/issues/156 the image and audio chunks could be written "on the fly" to a single file.

What this feature request basically asks for is some form of a flag, or property at options passed to MediaRecorder to NOT change state to inactive when a single <video> or <audio> element being captured changes src. Beyond the basic feature request, for the ability of MediaRecorder to record multiple tracks in sequence even if a track that is e.g., at index 1 of an array passed to MediaStream is ended before the track at index 0, then write the data to the resulting webm file. Instead of using several different API's (AudioContext, canvas) to try to achieve the expected result of concatenating several audio and/or video fragments (or full videos) while simultaneously trying to synchronize audio and video.

The functionality is essentially possible using canvas and AudioContext, though could be far simpler if MediaRecorder was extended to handle multiple video and audio tracks.

In the absence of changing or extending MediaRecorder specification then the feature request is for a webm writer - which provides a means to concatenate video data, similar to what is possible using .decodeAudioData() and OfflineAudioContext() and/or concatenating AudioBuffers (see https://github.com/w3c/mediacapture-main/issues/575) - using human-readable structure, e.g., XML that is then written to EBML (https://github.com/vi/mkvparse), where the API is Promise-based (the media at index N might be read before media at index 0) though the mechanism "awaits" completion of all input media before writing the output: single webm file.

guest271314 commented 5 years ago

@Pehrsons Ideally the reading and writing could be performed without playback of the media, a precusor, related concept https://github.com/guest271314/OfflineMediaContext. OffscreenCanvas does not necessarily provide that functionality of OfflineAudioContext()

doesn't render the audio to the device hardware; instead, it generates it, as fast as it can

that is, throughout the process or recording video/images, a <video> elements essentially MUST be used and video MUST be played back to get the video data. There is no .decodeVideoData() to get the underlying images WITHOUT playing back the video.

Pehrsons commented 5 years ago

Based on your code new MediaRecorder(new MediaStream([video0.captureStream(), video1.captureStream(), video2.captureStream()]), it seems to me that you want to encode the three streams separately and concat them into one webm file.

Now that can be done by recording them separately, and remuxing in js, no? This assumes of course, that all the recordings have the same number and type of tracks, and codecs.

Like so (but in a loop or something nicer looking)

let r0 = new MediaRecorder(video0.captureStream());
let r1 = new MediaRecorder(video1.captureStream());
let r2 = new MediaRecorder(video2.captureStream());

let p0 = new Promise(r => r0.ondataavailable = e => r(e.data));
let p1 = new Promise(r => r1.ondataavailable = e => r(e.data));
let p2 = new Promise(r => r2.ondataavailable = e => r(e.data));

let b0 = await p0;
let b1 = await p1;
let b2 = await p2;

// Something home-written that basically uses ts-ebml to set up a pipe of
// EBMLDecoder -> EBMLReader -> EBMLEncoder, and processes the blobs in order.
js_webm_remux_concat(b0, b1, b2);

Looking at your proposal as a way to support multiple tracks, I'm not sure it's the right fix. For one, it doesn't handle tracks that start or end in parallel to other tracks.

Since there's so little consensus on supporting multiple tracks (other than if they're there at the start), I think this kind of fairly specific use-case fixes will have an even harder time to fly.

guest271314 commented 5 years ago

@Pehrsons

it seems to me that you want to encode the three streams separately and concat them into one webm file.

Yes. That is the concept. The reason that MediaRecorder is used at all is due to the lack of an API for video similar to AudioContext.decodeAudioData() which returns an AudioBuffer that can be concatenated to other AudioBuffers. Using OfflineAudioContext() the audio media does not need to be played back (audibly) to get and concatenate the AudioBuffers.

Now that can be done by recording them separately, and remuxing in js, no? This assumes of course, that all the recordings have the same number and type of tracks, and codecs.

That is another reason for using MediaRecorder, to create uniform .webm files. Notice that the media files at urls variable at https://github.com/guest271314/MediaFragmentRecorder/blob/master/MediaFragmentRecorder.html have different extensions, which is intentional.

The concept itself (concatenating media fragments) was inspired by A Shared Culture and Jesse Dylan. The use case: Create such a video/audio collage from disparate media using only API's shipped with modern, ostensibly FOSS, browsers.

Ideally, media playback should not be necessary at all, if there was a decodeVideoData() function which performed similar to .decodeAudioData(), and a OfflineVideoContext() similar to startRendering() functionality of OfflineAudioContext() (potentially incorporating OffscreenCanvas()) which (currently) doesn't

render the [video] to the device hardware; instead, it generates it, as fast as it can

Relevant to

Looking at your proposal as a way to support multiple tracks, I'm not sure it's the right fix. For one, it doesn't handle tracks that start or end in parallel to other tracks.

the concept is to create the necessary file structure - in parallel - then, if necessary (Chromium does not include cues in recorded webm, Firefox does) re-"scan" the file structure to insert the timestamps (cues); consider an array of integers or decimals that is "equalized".

The functionality of the feature request should "work" for both streams and static files, or combinations of the two, where the resulting webm file is in sequence, irrespective of when the discrete stream or file is decoded -> read -> encoded, similar to the functionality of Promise.all(), where the Promise at index N could be fulfilled or "settled" before the Promise at index 0.

The feature request is somewhat challenging to explain, as there is more than one use case and more than one way in which the API could be used. Essentially a recorder that records (streams and/or static files, e.g., acquired using fetch()) in parallel, without becoming inactive, while writing the data, then (and/or "on the fly") "equalizing" the data resulting in a single output: a single webm file.

Perhaps the feature request/proposal could be divided into several proposals

VideoContext() => decodeVideoData(ArrayBuffer|Blob|Blob URL|data URL|MediaStream) => VideoBuffer (capable of concatenation) => VideContext.createBufferSource() => .buffer = VideoBuffer => VideoBuffer.start(N, N+time(forward|reverse[playbackRate // at some point over past year or two created code which used Web Animation API, canvas, Web Audio to play "media" (images, AudioBuffer's) forwards and in reverse, variable playback rate, etc., though synchronization was challenging; original code and tests are lost though recollect concept and steps taken])) without using HTMLMediaElement (for) playback
OfflineVideoContext(ArrayBuffer|Blob|Blob URL|data URL|MediaStream) and startRendering() - (capable of concatenation) for video (or incorporate the functionality into OffscreenCanvas) => without using HTMLMediaElement (for) playback which generates a VideoBuffer "as fast as it can"; (e.g., new WritableStream(/* new OfflineMediaContext([fetch("media.webm"), fetch("media.mp4"), new MediaStream(stream), VideoBuffer, AudioBuffer, new TextTrack(), fetch("media.wav")]) */ )).pipeTo(new ReadableStream(/* new OfflineMediaRecorder() */)) => webm or .mkv, essentially similar to MediaSource functionality - with the SourceBuffer exposed
OfflineMediaRecorder() => combining the functionality of the above (exposed at Worker)
Extending MediaRecorder() and MediaStream() and OffscreenCanvas(), etc. => to perform the functionality above; e.g., since MediaRecorder already write the webm file, expose the entirety of the functionality to the developer, then the developer could turn on or off the recorder from writing a media track that is active or inactive; the developer could then write data to the container, without the input necessarily being exclusively a live or active MediaStream, when they decide, output a single mkv or webm file

though since all of those features surround the same subject matter a single API could be created which incorporates all of that functionality.

Since there's so little consensus on supporting multiple tracks (other than if they're there at the start), I think this kind of fairly specific use-case fixes will have an even harder time to fly.

Yes, gather that there is no "consensus" (which does not prevent "branches", or a "pull request") as to supporting multiple tracks (https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#765; https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#794; https://bugzilla.mozilla.org/show_bug.cgi?id=1276928). This presentation AT THE FRONTEND 2016 - Real time front-end alchemy, or: capturing, playing, altering and encoding video and audio streams, without servers or plugins! by Soledad Penadés states several important points

... so after a whole year of filing many bugs and crushing [sic] crashing the browser many many times, so like, all that, we are finally ready to talk about this kind of alchemy, the whole title is very long ... so streams ... it's not like these streams you can't join them ... you can actually join them ...

... you might be like, "how many streams can you have?" ... MediaStreams are track containers and probably that doesn't tell you anything, but Tracks can be audio or video, so if you imagine [sic] a DVD, a DVD can have several tracks of audio, and in theory, several tracks of video, you can change between the scenes, which never happens because producers just gave up on the idea [sic] ... But that's the whole idea of DVD's, and like, in general, streams of ... media that you can have various tracks of things. So, you can have any number of tracks, so you can have any number of tracks streams. That was the premise that we got sold DVD's based on.

so we need people to have weird new ideas ... we need more ideas to break it and make it better

Use it Break it File bugs Request features

What this proposal is attempting to posit is that improvements can be made as to concatenating media streams and static files having differing codecs. If that means a new Web API proposal for a webm writer that can decode => read => encode any input stream or static file, that is what this feature request proposes.

guest271314 commented 5 years ago

@Pehrsons Was unaware that some of these concepts have already been raised (in some instances even the same names of prospective objects) https://github.com/w3c/mediacapture-worker/issues/33.

After looking into the matter to a cursory degree

perhaps a VideoWorklet and/or And/or a MediaStreamWorklet/MediaStreamRecorderWorklet(s) which abstracts/extends HTMLVideoElement (removing unneeded object inheritances; essentially a standalone JavaScript video instance that does not project to a DOM element) which fetches, reads ( "as fast as it can", see OfflineAudioContext.startRendering()), decodes, encodes, streams (Blob, ArrayBuffer, MediaStreamTrack, etc., et al.) to main thread would allow for multiple media sources to be concatenated and streamed or "piped" from a Worker or Worklet thread?

guest271314 commented 5 years ago

@Pehrsons Before continue down the rabbit hole with this proposal/feature request will post links to the resources that have researched so far, so that these resources are not lost due to user error (have done that before ("Problem Exists Between Chair And Keyboard :p") "lost code" that used Web Animation API to create a "video" and Native Messaging to bypass Web Speech API to communicate directly with espeak-ng; until "retrieve" the code from the currently non-functional device) and future specification writers/implementers (both for browsers and in the wild) might find useful

FRONT-END Better web video with AV1 codec (simple explanation of containers and codecs)
Port of the AV1 Video codec to WebAssembly (once in rabbit hole, venture down the fork in the road where the sign says wasm)
How to extract raw VP9 stream from WebM file? ("VP9 does not have a pure raw bitstream format. The closest thing is the lightweight IVF format - 32 byte global header + 12 byte header per frame .")
IVF (what is an IVF file?)
AV1 Wasm decoder demo
https://github.com/GoogleChromeLabs/wasm-av1/issues/2#issuecomment-446056435 (sample IVF files)
AV1 Bitstream Analyzer ("A single video frame can sometimes take more than an hour to encode and as part of our routine testing, we encode 30 clips, each containing 60 frames. The encoding process is massively parallel and runs on a large number of AWS instances, but even with all that hardware, it can take hours or even days to run a single test job.")
rav1e ("The fastest and safest AV1 encoder.")
AV1 Decoder AV1 decoder is apparently included in Firefox and Chromium source code though no AV1 encoder ("codec") is a supported type of MediaRecorder, unless composing the codec incorrectly MediaRecorder.isTypeSupported('video/webm;codecs=av01.0.05M.08') // false
(Intermission 1. What happens when MediaRecorder is called multiple times in "parallel" (e.g., .map() without async/await) and the media recorder is less than 2 seconds? Consistent or unexpected results?)
What really happens when CanvasRenderingContext2D.drawImage() and requestAnimationFrame() are used? Chromium has implemented queueMicrotask, which is executed before requestAnimationFrame() though does that solve the issue of images actually loading? (Add decode() functionality to image elements.; Generating Images in JavaScript Without Using the Canvas API And putting them into a web notification)
jsmpeg: Why a JavaScript Video Decoder Actually Makes Sense by Dominic Szablewski
MPEG1 VIDEO DECODER IN JAVASCRIPT
Instructions to do WebM live streaming via DASH
WebM VOD Baseline format (This is (probably) closer to what we are trying to achieve, writing the data to a "container" (of choice))
Codec and container switching in MSE Sample (Media Source Extensions update: .changeType(); this could be useful at Chromium; Firefox implementation of "segments" mode currently achieves expected result https://github.com/guest271314/MediaFragmentRecorder/commit/09a731789d3aa6b5e4bd8b11d2f2387d8e08e5b9; https://github.com/w3c/media-source/issues/190; How to use “segments” mode at SourceBuffer of MediaSource to render same result at Chomium, Chorme and Firefox?)
(Intermission 2. Since HTML <video> can decode various videos, is it possible to somehow extract the underlying code which decodes that media_file.ext? Do we have to use the <video> element, where the requirement is really to not necessarily play the media to get the underlying images (and audio) from point a to point b (if "cues" are present) and concatenate those to a single "container" (e.g., .mkv or .webm, etc. i.e., - webm-wasm lets you create webm videos in JavaScript via WebAssembly.)? Enter Rust (!) ("Rust is a multi-paradigm systems programming language[12] focused on safety, especially safe concurrency.[13][14] Rust is syntactically similar to C++,[15] but is designed to provide better memory safety while maintaining high performance. Rust was originally designed by Graydon Hoare at Mozilla Research, with contributions from Dave Herman, Brendan Eich, and others.[16][17] The designers refined the language while writing the Servo layout engine[18] and the Rust compiler. The compiler is free and open-source software dual-licensed under the MIT License and Apache License 2.0.") where there is evidently an assortment of thriving projects)
Factor Gecko specific code out of VideoFrameContainer Bug 1416663 ("There's a bunch of code that uses HTMLMediaElement in VideoFrameContainer. Most of this is layout specific. If we can factor that out, say back into HTMLMediaElement which knows all about how Gecko does layout, then it's easier to import VideoFrameContainer into the gecko-media Rust crate.")
Rust and WebAssembly (This is a very instructive document re wasm <=> JavaScript: https://cdn.rawgit.com/WebAssembly/wabt/aae5a4b7/demo/wat2wasm/)
wasm_bindgen
Struct web_sys::HtmlMediaElement
Enum script::dom::htmlmediaelement::MediaElementMicrotask (Have not tried Rust language, though appears that it is possible to use HTMLMediaElement, HTMLVideoElement and MediaStreamTrack without using HTML <video> element, possibly in a Worker and/or Worklet thread? TODO: Dive in to Rust)

Some code which attempts to emulate the MDN description "as fast as it can" of startRendering() method of OfflineAudioContext(), essentially "parallel" asynchronous procedures passed to Promise.all() (data URL representation of image is for expediency, without regard for "compression". When trying with 1 second slices the result has unexpected consequences; there is some source code for MediaRecorder at either Chromium or Firefox which referenced 2 seconds?). TODO: try farming this procedure out to AudioWorklet and/or TaskWorklet (though since "WebWorkers can be expensive (e.g: ~5MB per thread in Chrome)" tasklets is that abstraction really accomplishing anything? Eventually crashed the tab at plnkr when trying taskWorklet multiple times at same session (trying to get a reference to a <video> within TaskWorkerGlobalScope))

<!DOCTYPE html>
<html>

<head>
</head>

<body>
  <div>click</div>
  <script>
    (async() => {
      const url = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4";
      const blob = await (await fetch(url)).blob();
      const blobURL = URL.createObjectURL(blob);
      const meta = document.createElement("video");
      const canvas = document.createElement("canvas");
      document.body.appendChild(canvas);
      const ctx = canvas.getContext("2d");
      let duration = await new Promise(resolve => {
        meta.addEventListener("loadedmetadata", e => {
          canvas.width = meta.videoWidth;
          canvas.height = meta.videoHeight;
          // TODO: address media with no duration: media recorded using `MediaRecorder`
          // `ts-ebml` handles this case, though what if we do not want to use any libraries?
          resolve(meta.duration);
        });
        meta.src = blobURL;
      });

      console.log(duration);
      document.querySelector("div")
        .addEventListener("click", async e => {
          let z = 0;
          const chunks = [...Array(Math.floor(duration / 2) + 1).keys()].map(n => ({from:z, to:(z+=2) > duration ? duration : z}));
          console.log(chunks);

          const data = await Promise.all(chunks.map(({from, to}) => new Promise(resolve => {
            const video = document.createElement("video");
            const canvas = document.createElement("canvas");
            const ctx = canvas.getContext("2d");
            const images = [];
            let raf, n = 0;
            const draw = _ => {
              console.log(`drawing image ${n++}`);
              if (video.paused) {
                cancelAnimationFrame(raf);
                return;
              }
              ctx.drawImage(video, 0, 0, video.videoWidth, video.videoHeight);
              images.push(canvas.toDataURL());
              raf = requestAnimationFrame(draw);
            }
            const recorder = new MediaRecorder(video.captureStream());
            recorder.addEventListener("dataavailable", e => {
              cancelAnimationFrame(raf);
              resolve({images, blob:e.data});
            });
            video.addEventListener("playing", e => {
              if (recorder.state !== "recording") {
                recorder.start();
              }
              raf = requestAnimationFrame(draw);
            }, {once:true});
            video.addEventListener("canplay", e => {
              canvas.width = video.videoWidth;
              canvas.height = video.videoHeight;
              video.play().catch(console.error);
            }, {once:true});

            video.addEventListener("pause", e => {
              recorder.stop();
              cancelAnimationFrame(raf)
            });
            const src = `${blobURL}#t=${from},${to}`;
            console.log(src);
            video.src = src;

          })));
          console.log(data);
          /*
          data.forEach(({blob, images}) => {
            console.log(images);
            const video = document.createElement("video");
            video.controls = true;
            document.body.appendChild(video);
            video.src = URL.createObjectURL(blob);
          });
          */
          // TODO: draw the images to a <canvas>, though see https://codereview.chromium.org/2769823002/
        })
    })();
  </script>
</body>

</html>

Upon running the above code, it occurred that we could create yet another (very simple) media "container" type, again, using only the browser: create N slices of separate audio and image "files" using MediaRecorder; e.g. something like,

{
audio: {data: /* audio as an array, capable of serialization */, from: 0, to: 2},
video: {data: /* video as an array of images (uncompressed, we'll address that "later") */, from: 0, to: 2}
, title: /* title, as array or string */ 
}

where a "server" and/or MediaRecorder could select any "segments" of media, concatenate and encode as a .ext file and serve that "file"; or, simply serve the requested "segments" in the JSON form. The issue would then be how to stream the data using ReadableStream/WritableStream, though that can be overcome, to an appreciable degree by only serving the smallest "chunk" possible (2 seconds?). That is: use the browser itself to encode the files.

w3c / mediacapture-record

Feature request: Concat filter #166