Open guest271314 opened 5 years ago
You could remux this in JS with fairly low overhead, no? Would that achieve the same thing?
@Pehrsons Do you mean re-record the recorded media fragments? There are several approaches that have considered and tried.
The concept is briefly outlined at https://github.com/w3c/mediacapture-main/issues/575.
https://github.com/guest271314/ts-ebml provides assistance "remux"ing recorded webm files. https://github.com/guest271314/whammy and https://github.com/guest271314/webm-writer-js provide a means to write images to a webm file created "on the fly", though without support for adding/including audio.
If there were a means to write the Matroska or Webm file directly using human-readable format https://github.com/Matroska-Org/matroska-specification/issues/129, https://github.com/Matroska-Org/ebml-specification/issues/156 the image and audio chunks could be written "on the fly" to a single file.
What this feature request basically asks for is some form of a flag, or property at options passed to MediaRecorder
to NOT change state
to inactive when a single <video>
or <audio>
element being captured changes src
. Beyond the basic feature request, for the ability of MediaRecorder
to record multiple tracks in sequence even if a track that is e.g., at index 1 of an array passed to MediaStream
is ended before the track at index 0, then write the data to the resulting webm file. Instead of using several different API's (AudioContext
, canvas
) to try to achieve the expected result of concatenating several audio and/or video fragments (or full videos) while simultaneously trying to synchronize audio and video.
The functionality is essentially possible using canvas
and AudioContext
, though could be far simpler if MediaRecorder
was extended to handle multiple video and audio tracks.
In the absence of changing or extending MediaRecorder
specification then the feature request is for a webm writer - which provides a means to concatenate video data, similar to what is possible using .decodeAudioData()
and OfflineAudioContext()
and/or concatenating AudioBuffer
s (see https://github.com/w3c/mediacapture-main/issues/575) - using human-readable structure, e.g., XML
that is then written to EBML
(https://github.com/vi/mkvparse), where the API is Promise
-based (the media at index N might be read before media at index 0) though the mechanism "awaits" completion of all input media before writing the output: single webm file.
@Pehrsons Ideally the reading and writing could be performed without playback of the media, a precusor, related concept https://github.com/guest271314/OfflineMediaContext. OffscreenCanvas
does not necessarily provide that functionality of OfflineAudioContext()
doesn't render the audio to the device hardware; instead, it generates it, as fast as it can
that is, throughout the process or recording video/images, a <video>
elements essentially MUST be used and video MUST be played back to get the video data. There is no .decodeVideoData()
to get the underlying images WITHOUT playing back the video.
Based on your code new MediaRecorder(new MediaStream([video0.captureStream(), video1.captureStream(), video2.captureStream()])
, it seems to me that you want to encode the three streams separately and concat them into one webm file.
Now that can be done by recording them separately, and remuxing in js, no? This assumes of course, that all the recordings have the same number and type of tracks, and codecs.
Like so (but in a loop or something nicer looking)
let r0 = new MediaRecorder(video0.captureStream());
let r1 = new MediaRecorder(video1.captureStream());
let r2 = new MediaRecorder(video2.captureStream());
let p0 = new Promise(r => r0.ondataavailable = e => r(e.data));
let p1 = new Promise(r => r1.ondataavailable = e => r(e.data));
let p2 = new Promise(r => r2.ondataavailable = e => r(e.data));
let b0 = await p0;
let b1 = await p1;
let b2 = await p2;
// Something home-written that basically uses ts-ebml to set up a pipe of
// EBMLDecoder -> EBMLReader -> EBMLEncoder, and processes the blobs in order.
js_webm_remux_concat(b0, b1, b2);
Looking at your proposal as a way to support multiple tracks, I'm not sure it's the right fix. For one, it doesn't handle tracks that start or end in parallel to other tracks.
Since there's so little consensus on supporting multiple tracks (other than if they're there at the start), I think this kind of fairly specific use-case fixes will have an even harder time to fly.
@Pehrsons
it seems to me that you want to encode the three streams separately and concat them into one webm file.
Yes. That is the concept. The reason that MediaRecorder
is used at all is due to the lack of an API for video similar to AudioContext.decodeAudioData()
which returns an AudioBuffer
that can be concatenated to other AudioBuffer
s. Using OfflineAudioContext()
the audio media does not need to be played back (audibly) to get and concatenate the AudioBuffer
s.
Now that can be done by recording them separately, and remuxing in js, no? This assumes of course, that all the recordings have the same number and type of tracks, and codecs.
That is another reason for using MediaRecorder
, to create uniform .webm
files. Notice that the media files at urls
variable at https://github.com/guest271314/MediaFragmentRecorder/blob/master/MediaFragmentRecorder.html have different extensions, which is intentional.
The concept itself (concatenating media fragments) was inspired by A Shared Culture and Jesse Dylan. The use case: Create such a video/audio collage from disparate media using only API's shipped with modern, ostensibly FOSS, browsers.
Ideally, media playback should not be necessary at all, if there was a decodeVideoData()
function which performed similar to .decodeAudioData()
, and a OfflineVideoContext()
similar to startRendering()
functionality of OfflineAudioContext()
(potentially incorporating OffscreenCanvas()
) which (currently) doesn't
render the [video] to the device hardware; instead, it generates it, as fast as it can
Relevant to
Looking at your proposal as a way to support multiple tracks, I'm not sure it's the right fix. For one, it doesn't handle tracks that start or end in parallel to other tracks.
the concept is to create the necessary file structure - in parallel - then, if necessary (Chromium does not include cues in recorded webm
, Firefox does) re-"scan" the file structure to insert the timestamps (cues); consider an array of integers or decimals that is "equalized".
The functionality of the feature request should "work" for both streams and static files, or combinations of the two, where the resulting webm
file is in sequence, irrespective of when the discrete stream or file is decoded -> read -> encoded, similar to the functionality of Promise.all()
, where the Promise
at index N could be fulfilled or "settled" before the Promise
at index 0
.
The feature request is somewhat challenging to explain, as there is more than one use case and more than one way in which the API could be used. Essentially a recorder that records (streams and/or static files, e.g., acquired using fetch()
) in parallel, without becoming inactive, while writing the data, then (and/or "on the fly") "equalizing" the data resulting in a single output: a single webm
file.
Perhaps the feature request/proposal could be divided into several proposals
VideoContext()
=> decodeVideoData(ArrayBuffer|Blob|Blob URL|data URL|MediaStream)
=> VideoBuffer
(capable of concatenation) => VideContext.createBufferSource()
=> .buffer = VideoBuffer
=> VideoBuffer.start(N, N+time(forward|reverse[playbackRate // at some point over past year or two created code which used Web Animation API, canvas, Web Audio to play "media" (images, AudioBuffer's) forwards and in reverse, variable playback rate, etc., though synchronization was challenging; original code and tests are lost though recollect concept and steps taken]))
without using HTMLMediaElement
(for) playback OfflineVideoContext(ArrayBuffer|Blob|Blob URL|data URL|MediaStream)
and startRendering()
- (capable of concatenation) for video (or incorporate the functionality into OffscreenCanvas
) => without using HTMLMediaElement
(for) playback which generates a VideoBuffer
"as fast as it can"; (e.g., new WritableStream(/* new OfflineMediaContext([fetch("media.webm"), fetch("media.mp4"), new MediaStream(stream), VideoBuffer, AudioBuffer, new TextTrack(), fetch("media.wav")]) */ )).pipeTo(new ReadableStream(/* new OfflineMediaRecorder() */))
=> webm
or .mkv
, essentially similar to MediaSource
functionality - with the SourceBuffer
exposedOfflineMediaRecorder()
=> combining the functionality of the above (exposed at Worker
)MediaRecorder()
and MediaStream()
and OffscreenCanvas()
, etc. => to perform the functionality above; e.g., since MediaRecorder
already write the webm
file, expose the entirety of the functionality to the developer, then the developer could turn on or off the recorder from writing a media track that is active
or inactive
; the developer could then write data to the container, without the input necessarily being exclusively a live or active MediaStream
, when they decide, output a single mkv
or webm
filethough since all of those features surround the same subject matter a single API could be created which incorporates all of that functionality.
Since there's so little consensus on supporting multiple tracks (other than if they're there at the start), I think this kind of fairly specific use-case fixes will have an even harder time to fly.
Yes, gather that there is no "consensus" (which does not prevent "branches", or a "pull request") as to supporting multiple tracks (https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#765; https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#794; https://bugzilla.mozilla.org/show_bug.cgi?id=1276928). This presentation AT THE FRONTEND 2016 - Real time front-end alchemy, or: capturing, playing, altering and encoding video and audio streams, without servers or plugins! by Soledad Penadés states several important points
... so after a whole year of filing many bugs and crushing [sic] crashing the browser many many times, so like, all that, we are finally ready to talk about this kind of alchemy, the whole title is very long ... so streams ... it's not like these streams you can't join them ... you can actually join them ...
... you might be like, "how many streams can you have?" ...
MediaStreams
aretrack
containers and probably that doesn't tell you anything, but Tracks can be audio or video, so if you imagine [sic] a DVD, a DVD can have several tracks of audio, and in theory, several tracks of video, you can change between the scenes, which never happens because producers just gave up on the idea [sic] ... But that's the whole idea of DVD's, and like, in general, streams of ... media that you can have various tracks of things. So, you can have any number of tracks, so you can have any number of tracks streams. That was the premise that we got sold DVD's based on.so we need people to have weird new ideas ... we need more ideas to break it and make it better
Use it Break it File bugs Request features
What this proposal is attempting to posit is that improvements can be made as to concatenating media streams and static files having differing codecs. If that means a new Web API proposal for a webm
writer that can decode => read => encode any input stream or static file, that is what this feature request proposes.
@Pehrsons Was unaware that some of these concepts have already been raised (in some instances even the same names of prospective objects) https://github.com/w3c/mediacapture-worker/issues/33.
After looking into the matter to a cursory degree
perhaps a VideoWorklet
and/or And/or a MediaStreamWorklet
/MediaStreamRecorderWorklet
(s) which abstracts/extends HTMLVideoElement
(removing unneeded object inheritances; essentially a standalone JavaScript video instance that does not project to a DOM
element) which fetches, reads ( "as fast as it can", see OfflineAudioContext.startRendering()
), decodes, encodes, streams (Blob
, ArrayBuffer
, MediaStreamTrack
, etc., et al.) to main thread would allow for multiple media sources to be concatenated and streamed or "piped" from a Worker
or Worklet
thread?
@Pehrsons Before continue down the rabbit hole with this proposal/feature request will post links to the resources that have researched so far, so that these resources are not lost due to user error (have done that before ("Problem Exists Between Chair And Keyboard :p") "lost code" that used Web Animation API to create a "video" and Native Messaging to bypass Web Speech API to communicate directly with espeak-ng
; until "retrieve" the code from the currently non-functional device) and future specification writers/implementers (both for browsers and in the wild) might find useful
MediaRecorder
, unless composing the codec incorrectly MediaRecorder.isTypeSupported('video/webm;codecs=av01.0.05M.08') // false
MediaRecorder
is called multiple times in "parallel" (e.g., .map()
without async/await
) and the media recorder is less than 2 seconds? Consistent or unexpected results?)CanvasRenderingContext2D.drawImage()
and requestAnimationFrame()
are used? Chromium has implemented queueMicrotask
, which is executed before requestAnimationFrame()
though does that solve the issue of images actually loading? (Add decode() functionality to image elements.; Generating Images in JavaScript Without Using the Canvas API And putting them into a web notification).changeType()
; this could be useful at Chromium; Firefox implementation of "segments"
mode currently achieves expected result https://github.com/guest271314/MediaFragmentRecorder/commit/09a731789d3aa6b5e4bd8b11d2f2387d8e08e5b9; https://github.com/w3c/media-source/issues/190; How to use “segments” mode at SourceBuffer of MediaSource to render same result at Chomium, Chorme and Firefox?)<video>
can decode various videos, is it possible to somehow extract the underlying code which decodes that media_file.ext
? Do we have to use the <video>
element, where the requirement is really to not necessarily play the media to get the underlying images (and audio) from point a to point b (if "cues" are present) and concatenate those to a single "container" (e.g., .mkv
or .webm
, etc. i.e., - webm-wasm lets you create webm videos in JavaScript via WebAssembly.)? Enter Rust (!) ("Rust is a multi-paradigm systems programming language[12] focused on safety, especially safe concurrency.[13][14] Rust is syntactically similar to C++,[15] but is designed to provide better memory safety while maintaining high performance.
Rust was originally designed by Graydon Hoare at Mozilla Research, with contributions from Dave Herman, Brendan Eich, and others.[16][17] The designers refined the language while writing the Servo layout engine[18] and the Rust compiler. The compiler is free and open-source software dual-licensed under the MIT License and Apache License 2.0.") where there is evidently an assortment of thriving projects) HTMLMediaElement
, HTMLVideoElement
and MediaStreamTrack
without using HTML <video>
element, possibly in a Worker
and/or Worklet
thread? TODO: Dive in to Rust)Some code which attempts to emulate the MDN description "as fast as it can" of startRendering()
method of OfflineAudioContext()
, essentially "parallel" asynchronous procedures passed to Promise.all()
(data URL
representation of image is for expediency, without regard for "compression". When trying with 1 second slices the result has unexpected consequences; there is some source code for MediaRecorder
at either Chromium or Firefox which referenced 2 seconds?). TODO: try farming this procedure out to AudioWorklet
and/or TaskWorklet
(though since "WebWorkers can be expensive (e.g: ~5MB per thread in Chrome)" tasklets is that abstraction really accomplishing anything? Eventually crashed the tab at plnkr when trying taskWorklet
multiple times at same session (trying to get a reference to a <video>
within TaskWorkerGlobalScope
))
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div>click</div>
<script>
(async() => {
const url = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4";
const blob = await (await fetch(url)).blob();
const blobURL = URL.createObjectURL(blob);
const meta = document.createElement("video");
const canvas = document.createElement("canvas");
document.body.appendChild(canvas);
const ctx = canvas.getContext("2d");
let duration = await new Promise(resolve => {
meta.addEventListener("loadedmetadata", e => {
canvas.width = meta.videoWidth;
canvas.height = meta.videoHeight;
// TODO: address media with no duration: media recorded using `MediaRecorder`
// `ts-ebml` handles this case, though what if we do not want to use any libraries?
resolve(meta.duration);
});
meta.src = blobURL;
});
console.log(duration);
document.querySelector("div")
.addEventListener("click", async e => {
let z = 0;
const chunks = [...Array(Math.floor(duration / 2) + 1).keys()].map(n => ({from:z, to:(z+=2) > duration ? duration : z}));
console.log(chunks);
const data = await Promise.all(chunks.map(({from, to}) => new Promise(resolve => {
const video = document.createElement("video");
const canvas = document.createElement("canvas");
const ctx = canvas.getContext("2d");
const images = [];
let raf, n = 0;
const draw = _ => {
console.log(`drawing image ${n++}`);
if (video.paused) {
cancelAnimationFrame(raf);
return;
}
ctx.drawImage(video, 0, 0, video.videoWidth, video.videoHeight);
images.push(canvas.toDataURL());
raf = requestAnimationFrame(draw);
}
const recorder = new MediaRecorder(video.captureStream());
recorder.addEventListener("dataavailable", e => {
cancelAnimationFrame(raf);
resolve({images, blob:e.data});
});
video.addEventListener("playing", e => {
if (recorder.state !== "recording") {
recorder.start();
}
raf = requestAnimationFrame(draw);
}, {once:true});
video.addEventListener("canplay", e => {
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
video.play().catch(console.error);
}, {once:true});
video.addEventListener("pause", e => {
recorder.stop();
cancelAnimationFrame(raf)
});
const src = `${blobURL}#t=${from},${to}`;
console.log(src);
video.src = src;
})));
console.log(data);
/*
data.forEach(({blob, images}) => {
console.log(images);
const video = document.createElement("video");
video.controls = true;
document.body.appendChild(video);
video.src = URL.createObjectURL(blob);
});
*/
// TODO: draw the images to a <canvas>, though see https://codereview.chromium.org/2769823002/
})
})();
</script>
</body>
</html>
Upon running the above code, it occurred that we could create yet another (very simple) media "container" type, again, using only the browser: create N slices of separate audio and image "files" using MediaRecorder
; e.g. something like,
{
audio: {data: /* audio as an array, capable of serialization */, from: 0, to: 2},
video: {data: /* video as an array of images (uncompressed, we'll address that "later") */, from: 0, to: 2}
, title: /* title, as array or string */
}
where a "server" and/or MediaRecorder
could select any "segments" of media, concatenate and encode as a .ext
file and serve that "file"; or, simply serve the requested "segments" in the JSON
form. The issue would then be how to stream the data using ReadableStream
/WritableStream
, though that can be overcome, to an appreciable degree by only serving the smallest "chunk" possible (2 seconds?). That is: use the browser itself to encode the files.
Related: https://github.com/w3c/mediacapture-record/issues/147; https://github.com/w3c/mediacapture-fromelement
Feature request: Include an option to
MediaRecorder
to concatenate (e.g., http://trac.ffmpeg.org/wiki/Concatenate) all input streams to a singlewebm
file.For example
Promise.all([Promise0, Promise1, Promise2])
when all
PromiseN
are fulfilledPromise.all()
is fulfilled, even ifPromise2
is resolved beforePromise0
.Such code can be implemented in
MediaRecorder
as