w3c / mediacapture-record

MediaStream Recording
https://w3c.github.io/mediacapture-record/
Other
103 stars 22 forks source link

Feature request: Concat filter #166

Open guest271314 opened 5 years ago

guest271314 commented 5 years ago

Related: https://github.com/w3c/mediacapture-record/issues/147; https://github.com/w3c/mediacapture-fromelement

Feature request: Include an option to MediaRecorder to concatenate (e.g., http://trac.ffmpeg.org/wiki/Concatenate) all input streams to a single webm file.

For example

Promise.all([Promise0, Promise1, Promise2])

when all PromiseN are fulfilled Promise.all() is fulfilled, even if Promise2 is resolved before Promise0.

Such code can be implemented in MediaRecorder as

let recorder = new MediaRecorder(new MediaStream([video0.captureStream(), video1.captureStream(), video2.captureStream()]), {concat:true, width:<"scaleToMinimumDetected|scaleToMaximumDetected|default:maximum">, height:<"scaleToMinimumDetected|scaleToMaximumDetected|default:maximum">});

recorder.ondataavailable = e => {
 // e.data single `.webm` file
}
Pehrsons commented 5 years ago

You could remux this in JS with fairly low overhead, no? Would that achieve the same thing?

guest271314 commented 5 years ago

@Pehrsons Do you mean re-record the recorded media fragments? There are several approaches that have considered and tried.

The concept is briefly outlined at https://github.com/w3c/mediacapture-main/issues/575.

https://github.com/guest271314/ts-ebml provides assistance "remux"ing recorded webm files. https://github.com/guest271314/whammy and https://github.com/guest271314/webm-writer-js provide a means to write images to a webm file created "on the fly", though without support for adding/including audio.

If there were a means to write the Matroska or Webm file directly using human-readable format https://github.com/Matroska-Org/matroska-specification/issues/129, https://github.com/Matroska-Org/ebml-specification/issues/156 the image and audio chunks could be written "on the fly" to a single file.

What this feature request basically asks for is some form of a flag, or property at options passed to MediaRecorder to NOT change state to inactive when a single <video> or <audio> element being captured changes src. Beyond the basic feature request, for the ability of MediaRecorder to record multiple tracks in sequence even if a track that is e.g., at index 1 of an array passed to MediaStream is ended before the track at index 0, then write the data to the resulting webm file. Instead of using several different API's (AudioContext, canvas) to try to achieve the expected result of concatenating several audio and/or video fragments (or full videos) while simultaneously trying to synchronize audio and video.

The functionality is essentially possible using canvas and AudioContext, though could be far simpler if MediaRecorder was extended to handle multiple video and audio tracks.

In the absence of changing or extending MediaRecorder specification then the feature request is for a webm writer - which provides a means to concatenate video data, similar to what is possible using .decodeAudioData() and OfflineAudioContext() and/or concatenating AudioBuffers (see https://github.com/w3c/mediacapture-main/issues/575) - using human-readable structure, e.g., XML that is then written to EBML (https://github.com/vi/mkvparse), where the API is Promise-based (the media at index N might be read before media at index 0) though the mechanism "awaits" completion of all input media before writing the output: single webm file.

guest271314 commented 5 years ago

@Pehrsons Ideally the reading and writing could be performed without playback of the media, a precusor, related concept https://github.com/guest271314/OfflineMediaContext. OffscreenCanvas does not necessarily provide that functionality of OfflineAudioContext()

doesn't render the audio to the device hardware; instead, it generates it, as fast as it can

that is, throughout the process or recording video/images, a <video> elements essentially MUST be used and video MUST be played back to get the video data. There is no .decodeVideoData() to get the underlying images WITHOUT playing back the video.

Pehrsons commented 5 years ago

Based on your code new MediaRecorder(new MediaStream([video0.captureStream(), video1.captureStream(), video2.captureStream()]), it seems to me that you want to encode the three streams separately and concat them into one webm file.

Now that can be done by recording them separately, and remuxing in js, no? This assumes of course, that all the recordings have the same number and type of tracks, and codecs.

Like so (but in a loop or something nicer looking)

let r0 = new MediaRecorder(video0.captureStream());
let r1 = new MediaRecorder(video1.captureStream());
let r2 = new MediaRecorder(video2.captureStream());

let p0 = new Promise(r => r0.ondataavailable = e => r(e.data));
let p1 = new Promise(r => r1.ondataavailable = e => r(e.data));
let p2 = new Promise(r => r2.ondataavailable = e => r(e.data));

let b0 = await p0;
let b1 = await p1;
let b2 = await p2;

// Something home-written that basically uses ts-ebml to set up a pipe of
// EBMLDecoder -> EBMLReader -> EBMLEncoder, and processes the blobs in order.
js_webm_remux_concat(b0, b1, b2);

Looking at your proposal as a way to support multiple tracks, I'm not sure it's the right fix. For one, it doesn't handle tracks that start or end in parallel to other tracks.

Since there's so little consensus on supporting multiple tracks (other than if they're there at the start), I think this kind of fairly specific use-case fixes will have an even harder time to fly.

guest271314 commented 5 years ago

@Pehrsons

it seems to me that you want to encode the three streams separately and concat them into one webm file.

Yes. That is the concept. The reason that MediaRecorder is used at all is due to the lack of an API for video similar to AudioContext.decodeAudioData() which returns an AudioBuffer that can be concatenated to other AudioBuffers. Using OfflineAudioContext() the audio media does not need to be played back (audibly) to get and concatenate the AudioBuffers.

Now that can be done by recording them separately, and remuxing in js, no? This assumes of course, that all the recordings have the same number and type of tracks, and codecs.

That is another reason for using MediaRecorder, to create uniform .webm files. Notice that the media files at urls variable at https://github.com/guest271314/MediaFragmentRecorder/blob/master/MediaFragmentRecorder.html have different extensions, which is intentional.

The concept itself (concatenating media fragments) was inspired by A Shared Culture and Jesse Dylan. The use case: Create such a video/audio collage from disparate media using only API's shipped with modern, ostensibly FOSS, browsers.

Ideally, media playback should not be necessary at all, if there was a decodeVideoData() function which performed similar to .decodeAudioData(), and a OfflineVideoContext() similar to startRendering() functionality of OfflineAudioContext() (potentially incorporating OffscreenCanvas()) which (currently) doesn't

render the [video] to the device hardware; instead, it generates it, as fast as it can

Relevant to

Looking at your proposal as a way to support multiple tracks, I'm not sure it's the right fix. For one, it doesn't handle tracks that start or end in parallel to other tracks.

the concept is to create the necessary file structure - in parallel - then, if necessary (Chromium does not include cues in recorded webm, Firefox does) re-"scan" the file structure to insert the timestamps (cues); consider an array of integers or decimals that is "equalized".

The functionality of the feature request should "work" for both streams and static files, or combinations of the two, where the resulting webm file is in sequence, irrespective of when the discrete stream or file is decoded -> read -> encoded, similar to the functionality of Promise.all(), where the Promise at index N could be fulfilled or "settled" before the Promise at index 0.

The feature request is somewhat challenging to explain, as there is more than one use case and more than one way in which the API could be used. Essentially a recorder that records (streams and/or static files, e.g., acquired using fetch()) in parallel, without becoming inactive, while writing the data, then (and/or "on the fly") "equalizing" the data resulting in a single output: a single webm file.

Perhaps the feature request/proposal could be divided into several proposals

  1. VideoContext() => decodeVideoData(ArrayBuffer|Blob|Blob URL|data URL|MediaStream) => VideoBuffer (capable of concatenation) => VideContext.createBufferSource() => .buffer = VideoBuffer => VideoBuffer.start(N, N+time(forward|reverse[playbackRate // at some point over past year or two created code which used Web Animation API, canvas, Web Audio to play "media" (images, AudioBuffer's) forwards and in reverse, variable playback rate, etc., though synchronization was challenging; original code and tests are lost though recollect concept and steps taken])) without using HTMLMediaElement (for) playback
  2. OfflineVideoContext(ArrayBuffer|Blob|Blob URL|data URL|MediaStream) and startRendering() - (capable of concatenation) for video (or incorporate the functionality into OffscreenCanvas) => without using HTMLMediaElement (for) playback which generates a VideoBuffer "as fast as it can"; (e.g., new WritableStream(/* new OfflineMediaContext([fetch("media.webm"), fetch("media.mp4"), new MediaStream(stream), VideoBuffer, AudioBuffer, new TextTrack(), fetch("media.wav")]) */ )).pipeTo(new ReadableStream(/* new OfflineMediaRecorder() */)) => webm or .mkv, essentially similar to MediaSource functionality - with the SourceBuffer exposed
  3. OfflineMediaRecorder() => combining the functionality of the above (exposed at Worker)
  4. Extending MediaRecorder() and MediaStream() and OffscreenCanvas(), etc. => to perform the functionality above; e.g., since MediaRecorder already write the webm file, expose the entirety of the functionality to the developer, then the developer could turn on or off the recorder from writing a media track that is active or inactive; the developer could then write data to the container, without the input necessarily being exclusively a live or active MediaStream, when they decide, output a single mkv or webm file

though since all of those features surround the same subject matter a single API could be created which incorporates all of that functionality.

Since there's so little consensus on supporting multiple tracks (other than if they're there at the start), I think this kind of fairly specific use-case fixes will have an even harder time to fly.

Yes, gather that there is no "consensus" (which does not prevent "branches", or a "pull request") as to supporting multiple tracks (https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#765; https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#794; https://bugzilla.mozilla.org/show_bug.cgi?id=1276928). This presentation AT THE FRONTEND 2016 - Real time front-end alchemy, or: capturing, playing, altering and encoding video and audio streams, without servers or plugins! by Soledad Penadés states several important points

... so after a whole year of filing many bugs and crushing [sic] crashing the browser many many times, so like, all that, we are finally ready to talk about this kind of alchemy, the whole title is very long ... so streams ... it's not like these streams you can't join them ... you can actually join them ...

... you might be like, "how many streams can you have?" ... MediaStreams are track containers and probably that doesn't tell you anything, but Tracks can be audio or video, so if you imagine [sic] a DVD, a DVD can have several tracks of audio, and in theory, several tracks of video, you can change between the scenes, which never happens because producers just gave up on the idea [sic] ... But that's the whole idea of DVD's, and like, in general, streams of ... media that you can have various tracks of things. So, you can have any number of tracks, so you can have any number of tracks streams. That was the premise that we got sold DVD's based on.

so we need people to have weird new ideas ... we need more ideas to break it and make it better

Use it Break it File bugs Request features

What this proposal is attempting to posit is that improvements can be made as to concatenating media streams and static files having differing codecs. If that means a new Web API proposal for a webm writer that can decode => read => encode any input stream or static file, that is what this feature request proposes.

guest271314 commented 5 years ago

@Pehrsons Was unaware that some of these concepts have already been raised (in some instances even the same names of prospective objects) https://github.com/w3c/mediacapture-worker/issues/33.

After looking into the matter to a cursory degree

perhaps a VideoWorklet and/or And/or a MediaStreamWorklet/MediaStreamRecorderWorklet(s) which abstracts/extends HTMLVideoElement (removing unneeded object inheritances; essentially a standalone JavaScript video instance that does not project to a DOM element) which fetches, reads ( "as fast as it can", see OfflineAudioContext.startRendering()), decodes, encodes, streams (Blob, ArrayBuffer, MediaStreamTrack, etc., et al.) to main thread would allow for multiple media sources to be concatenated and streamed or "piped" from a Worker or Worklet thread?

guest271314 commented 5 years ago

@Pehrsons Before continue down the rabbit hole with this proposal/feature request will post links to the resources that have researched so far, so that these resources are not lost due to user error (have done that before ("Problem Exists Between Chair And Keyboard :p") "lost code" that used Web Animation API to create a "video" and Native Messaging to bypass Web Speech API to communicate directly with espeak-ng; until "retrieve" the code from the currently non-functional device) and future specification writers/implementers (both for browsers and in the wild) might find useful

Some code which attempts to emulate the MDN description "as fast as it can" of startRendering() method of OfflineAudioContext(), essentially "parallel" asynchronous procedures passed to Promise.all() (data URL representation of image is for expediency, without regard for "compression". When trying with 1 second slices the result has unexpected consequences; there is some source code for MediaRecorder at either Chromium or Firefox which referenced 2 seconds?). TODO: try farming this procedure out to AudioWorklet and/or TaskWorklet (though since "WebWorkers can be expensive (e.g: ~5MB per thread in Chrome)" tasklets is that abstraction really accomplishing anything? Eventually crashed the tab at plnkr when trying taskWorklet multiple times at same session (trying to get a reference to a <video> within TaskWorkerGlobalScope))

<!DOCTYPE html>
<html>

<head>
</head>

<body>
  <div>click</div>
  <script>
    (async() => {
      const url = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4";
      const blob = await (await fetch(url)).blob();
      const blobURL = URL.createObjectURL(blob);
      const meta = document.createElement("video");
      const canvas = document.createElement("canvas");
      document.body.appendChild(canvas);
      const ctx = canvas.getContext("2d");
      let duration = await new Promise(resolve => {
        meta.addEventListener("loadedmetadata", e => {
          canvas.width = meta.videoWidth;
          canvas.height = meta.videoHeight;
          // TODO: address media with no duration: media recorded using `MediaRecorder`
          // `ts-ebml` handles this case, though what if we do not want to use any libraries?
          resolve(meta.duration);
        });
        meta.src = blobURL;
      });

      console.log(duration);
      document.querySelector("div")
        .addEventListener("click", async e => {
          let z = 0;
          const chunks = [...Array(Math.floor(duration / 2) + 1).keys()].map(n => ({from:z, to:(z+=2) > duration ? duration : z}));
          console.log(chunks);

          const data = await Promise.all(chunks.map(({from, to}) => new Promise(resolve => {
            const video = document.createElement("video");
            const canvas = document.createElement("canvas");
            const ctx = canvas.getContext("2d");
            const images = [];
            let raf, n = 0;
            const draw = _ => {
              console.log(`drawing image ${n++}`);
              if (video.paused) {
                cancelAnimationFrame(raf);
                return;
              }
              ctx.drawImage(video, 0, 0, video.videoWidth, video.videoHeight);
              images.push(canvas.toDataURL());
              raf = requestAnimationFrame(draw);
            }
            const recorder = new MediaRecorder(video.captureStream());
            recorder.addEventListener("dataavailable", e => {
              cancelAnimationFrame(raf);
              resolve({images, blob:e.data});
            });
            video.addEventListener("playing", e => {
              if (recorder.state !== "recording") {
                recorder.start();
              }
              raf = requestAnimationFrame(draw);
            }, {once:true});
            video.addEventListener("canplay", e => {
              canvas.width = video.videoWidth;
              canvas.height = video.videoHeight;
              video.play().catch(console.error);
            }, {once:true});

            video.addEventListener("pause", e => {
              recorder.stop();
              cancelAnimationFrame(raf)
            });
            const src = `${blobURL}#t=${from},${to}`;
            console.log(src);
            video.src = src;

          })));
          console.log(data);
          /*
          data.forEach(({blob, images}) => {
            console.log(images);
            const video = document.createElement("video");
            video.controls = true;
            document.body.appendChild(video);
            video.src = URL.createObjectURL(blob);
          });
          */
          // TODO: draw the images to a <canvas>, though see https://codereview.chromium.org/2769823002/
        })
    })();
  </script>
</body>

</html>

Upon running the above code, it occurred that we could create yet another (very simple) media "container" type, again, using only the browser: create N slices of separate audio and image "files" using MediaRecorder; e.g. something like,

{
audio: {data: /* audio as an array, capable of serialization */, from: 0, to: 2},
video: {data: /* video as an array of images (uncompressed, we'll address that "later") */, from: 0, to: 2}
, title: /* title, as array or string */ 
}

where a "server" and/or MediaRecorder could select any "segments" of media, concatenate and encode as a .ext file and serve that "file"; or, simply serve the requested "segments" in the JSON form. The issue would then be how to stream the data using ReadableStream/WritableStream, though that can be overcome, to an appreciable degree by only serving the smallest "chunk" possible (2 seconds?). That is: use the browser itself to encode the files.