w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
977 stars 136 forks source link

API for containers? #24

Closed pthatcherg closed 4 months ago

pthatcherg commented 4 years ago

It comes up as a common question: can we have an API for media containers? It's something that can be done JS/wasm and is arguably orthogonal to WebCodecs. But for some formats that you might consider video (GIF, (M)JPEG), the line is blurry between container and codec.

This is a tracking issue for a conversation around this topic. My current opinion is to leave it out of WebCodecs until it's more mature and then perhaps readdress it later.

steveanton commented 4 years ago

The concern I have with containers is that it exposes a lot of API and specification surface but does not unlock anything new that can't already be done (pretty efficiently even) today.

Following the principles of the Extensible Web Manifesto (https://extensiblewebmanifesto.org/), we should focus on delivering low-level codecs first. Only after that's done (or in parallel by a different group) consider tackling containers.

pthatcherg commented 4 years ago

I agree completely. That's a great way to put it.

guest271314 commented 4 years ago

Key use-cases

  • Non-realtime encoding/decoding/transcoding, such as for local file editing
  • Decoded and encoding images
  • Reencoding multiple input media streams in order to merge many encoded media streams into one encoded media stream.

and potentially

  • Live stream uploading

each could be considered an API that at least in part performed some form of code to edit a file. A file could be considered a "container" when technically compared to

  • Extremely low latency live streaming (<3s delay)
  • Cloud gaming
  • Advanced Real-time Communications: -- e2e encryption -- control over buffer behavior -- spatial and temporal scalability

and potentially

  • Live stream uploading

where any and all of the above is encompassed within the use case of recording media into a container whether that "container" be an array of images with an index element for "metadata", i.e., width, height, frame duration, or other adjustments made "mid-stream" or post-production ("codec" or instruction), to a .json (or, if preferred Matroska or WebM) "container", for download of both the entire procedure output by WebCodecs and specific time slices into a single "container" (file structure).

Since the topic is at hand and the maintainers of this repository spans a wide range of topics it might be helpful to create a glossary to point to exactly what you (this repository) mean the definition of the term that you are using.

Non-goal

  • Direct APIs for media containers (muxers/demuxers)

indicates the technical proximity of "codecs" to "containers".

Whether or not WebCodecs includes the reading, writing, editing, etc. of both "codecs" and "containers" that internal decision will not prevent nor preclude the fact of the actual use-case for a single API which is capable of both codec and container creation, extension, modification, etc., to avoid the necessity of attempting to utilizing what is available in different specifications that were not initially conceived as being interoperable with other APIs, both existing and proposed.

One recent use case for not omitting to include the capability to perform the same procedures within the scope of "containers" the same as "codecs" is that the two are symbiotic and when a single API designed and maintained with that consideration at the forefront and throughout has the potential to solve more than one existing issue where separate portions of code in the same domain, for example, media, could have very different output due to different authors' intent at the time they wrote and merged the code: time passes, new technologies that the issue resolves are now an issue because the implementers may or may not be in accord with the various branches of "media". A single API from creation to editing to production of media streams and files is the explicit use case.

guest271314 commented 4 years ago

Example use case: write audio to a WebM file https://plnkr.co/edit/Inb676?p=preview. Ideally, from a front-end perspective, this single API should be able to encode VP8 and Opus to a file, or if Opus is missing from an existing file, write the audio to the file.

guest271314 commented 4 years ago

@steveanton

(or in parallel by a different group)

What is necessary to start such a group? Post the proposal at https://discourse.wicg.io? (Note, am not a member of W3C).

guest271314 commented 4 years ago

@pthatcherg

It's something that can be done JS/wasm

While there does exist code which can write input images and output a WebM file (https://github.com/GoogleChromeLabs/webm-wasm; https://github.com/thenickdude/webm-writer-js) neither of the repository authors are interested in implementing writing audio to the same output file

https://github.com/thenickdude/webm-writer-js/pull/8#issuecomment-533503151

I don't have any plans to work on adding audio, sorry, and I'm not sure where best to begin either (it probably depends on what format you can capture the audio in and what environment you expect to run in, Chrome, arbitrary browser, Electron, etc).

https://github.com/GoogleChromeLabs/webm-wasm/issues/12

It seems this project doesn't support encoding audio+video yet, just video-only? It this feasible, or would it be better to just use the more heavyweight ffmpeg.js project for this?

https://github.com/GoogleChromeLabs/webm-wasm/issues/12#issuecomment-533126745

Yeah, there’s no support for audio and I don’t have any plans to add it. This project was born out of the lackluster capabilities of MediaStreamRecorder.

ffmpeg.js is definitely one choice. But if you already have an encoded audio and video stream, an mkv muxer might do. That would be a lot faster and smaller. Hope this helps!

It seems this project doesn't support encoding audio+video yet, just video-only? It this feasible, or would it be better to just use the more heavyweight ffmpeg.js project for this?

There is an implementation which is capable of writing audio as Opus to a WebM container https://github.com/kbumsik/opus-media-recorder/.

Meaning this repository can reach maturity without a corresponding container writer being available and having proximal maturoty to write the output of WebCodecs to a file.

Thus, simply because "JS/wasm" exists does not mean that implementations exist to meet the requirements described at Key use-cases, particuarly

  • Non-realtime encoding/decoding/transcoding, such as for local file editing.
guest271314 commented 4 years ago

FWIW https://discourse.wicg.io/t/webmediacontainers-proposal/3928

padenot commented 4 years ago

To write down what was said during TPAC, this might be very important to avoid the proliferation of badly muxed files. Muxing is rather hard to get right.

A possible solution might be a vouched library, as noted above, but there is always the problem of updating it for bug fixes.

chrisn commented 3 years ago

I've been following the general discussion around WebCodecs, and the need for a media container API seems to be recognised, but I thought I'd add my own use case as an example.

I maintain a library waveform-data.js that produces data for waveform visualisation from audio.

This uses Web Audio decodeAudioData() - but has the well-known problems: it runs on the main thread so UI updates stall during decoding, it requires the entire encoded audio to be held in memory, there's no indication of progress so I can't tell how long it will take to complete, and there's no way to cancel the decode.

For this use case, the simplest solution would be to allow decodeAudioData() to run from a worker context, with an extended API to allow progress notifications and cancellation.

WebCodecs also solves these issues, but introduces a new one. Because the library is generic, it will accept any audio format that decodeAudioData supports. So in order to use Web Codecs the library would have to include code to parse all the container formats, or define an API that moves container parsing to users of the library. Both options increase the amount of JavaScript that needs to be delivered, and unnecessarily so because parsing the container is a capability the browser already has. Also, leaving container parsing to library users would make the library much harder for people to use.

chcunningham commented 3 years ago

Triage note: marking 'extension', as this would clearly be a new API.

chcunningham commented 3 years ago

@chrisn thanks for the use case.

I'm a little torn. I find the argument about existing demuxers to be persuasive, but less so on the muxing side. Browsers have long compiled-in full featured demuxers for <video> and MSE, but for muxing I think the only example is MediaRecorder, and the files it produces are pretty basic. For example, I don't think we currently ship a muxer that could produce a fragmented MP4.

We've found JS demuxing performance is quite good. Performance equal, there are some advantages to JS like rapid extensibility and perfect interoperability. My hope is that the download hit is largely amortized away by caching. WDYT?

But the JS answer rings a little hollow because the available libraries for this are pretty limited right now. If folks like the idea, we could organize a community / WG effort to build / centralize.

dalecurtis commented 3 years ago

I think containers are an entirely separate API from WebCodecs. The interfaces and processing model are likely entirely different from WebCodecs. E.g., it's likely a streams based API would work very well for containers. It will also need its own containers registry which describes per-container behavior.

IMHO, the options for solving this use case are:

padenot commented 3 years ago

I agree with both @dalecurtis and @chcunningham. Gecko is also running in-content-process WASM demuxers for security reasons (essentially, libogg compiled to WASM running in process), and confirms the findings of the link above. This has been shipped in release for a few versions without a single problem reported.

I prefer option 1 and 2 in @dalecurtis's comment, and this can be a gradual solution (1 then 2 if really needed).

3 I like less, those objects are not at the same abstraction level, and MediaRecorder only supports real-time media (not offline processing), although Gecko implements a proprietary extension that allows encoding faster than real-time, that we only used for testing (not exposed to the web of course).

davedoesdev commented 3 years ago

Is there a list of container projects? I've written a WebM muxer but this doesn't seem like the right place to keep track of them.

dalecurtis commented 1 year ago

I ended up writing a quick explainer for the third bullet in https://github.com/w3c/webcodecs/issues/24#issuecomment-842574380 (Extending MediaRecorder for muxing):

https://github.com/dalecurtis/mediarecorder-muxer/blob/main/explainer.md

Have your thoughts in https://github.com/w3c/webcodecs/issues/24#issuecomment-844148407 changed at all now that WebCodecs is more fleshed out @padenot?

At least internally folks don't seem to hate it. It's only targeted towards simple use cases as a hedge against a more complete containers API (which we (Chromium) are unlikely to undertake anytime soon). It looks like a fairly small implementation delta. Is this interesting at all?

cc: @youennf

guillaumebrunerie commented 1 year ago

Did anyone find a way to generate mp4 files client-side in Chrome? I tried all possible ways, but couldn't find one that works:

I'm working on a 2D animation app in the browser, I can currently easily export animations as a sequence of frames but not being able to export them as an mp4 file is pretty limiting. The format needs to be mp4 as the videos are meant to be imported in other programs that unfortunately only support mp4.

The other options I have left are

I'm not sure if it should be in this or another specification, but it seems like a pretty important missing use case. If the reason for it not being included is because it can already be done in Javascript, please link to a library that can actually do it.

dalecurtis commented 1 year ago

FWIW, MP4 support for MediaRecorder is being worked on in Chrome. You can follow along here: https://bugs.chromium.org/p/chromium/issues/detail?id=1072056

What went wrong with mp4box.js exactly? https://github.com/gpac/mp4box.js/blob/master/test/qunit-iso-creation.js shows how to handle creation. I think the only thing you might need to tweak is the segment size.

guillaumebrunerie commented 1 year ago

FWIW, MP4 support for MediaRecorder is being worked on in Chrome. You can follow along here: https://bugs.chromium.org/p/chromium/issues/detail?id=1072056

Great to hear, thanks for the link!

What went wrong with mp4box.js exactly? https://github.com/gpac/mp4box.js/blob/master/test/qunit-iso-creation.js shows how to handle creation. I think the only thing you might need to tweak is the segment size.

Actually I think it is mux.js that I have tried, not mp4box.js. It has a test file https://github.com/videojs/mux.js/blob/main/test/mp4-generator.test.js with a very promising name, but all the code there goes pretty deep into box types and things like that, so I could not manage to create a working mp4 file from that. I did not know about this mp4box example, but I'll definitely give it another try, thank you!

guillaumebrunerie commented 1 year ago

I managed to make MP4Box work with WebCodecs! See code below and a working example at https://codepen.io/Latcarf/pen/NWBmJVw.

The main thing I am still very confused about is the codec string. I couldn’t find a single example of valid H264 codec string on MDN, and after some trial and error I settled on avc1.64003d (found somewhere online) which seems to mostly work, but I have very little understanding of what it means (even after trying to read everything I can find about profiles and levels). It also doesn’t seem to always work, for instance if you change the size of the video to 200×200, it fails with a rather cryptic DOMException: Encoding error. (without any more explanation).

It would be great if either there was for instance a catch-all codec string h264 (or avc1, or mp4) which would mean "Choose whatever avc1.xxxxxx codec string that you believe is most appropriate", or at the very least some examples on MDN, like "If you want H264 HD video choose this, if you want a small H264 video choose that". I guess the MediaRecorder API already chooses an appropriate codec string on behalf of the user based on the size of the canvas, so it would be great if WebCodecs could do the same.

It also seems like we cannot create the track upfront because it needs the metadata.decoderConfig.description (which I have no idea what it contains). That’s not a big issue, but it is a bit hard to guess.

Here is my function doing the encoding:

const encodeFramesToMP4 = async ({width, height, fps, frames, renderFrame}) => {
    const f = MP4Box.createFile();
    let track = null;
    const frameDuration = 1_000_000/fps;

    const encoder = new VideoEncoder({
        output: (chunk, metadata) => {
            if (track === null) {
                track = f.addTrack({
                    timescale: 1_000_000,
                    width,
                    height,
                    avcDecoderConfigRecord: metadata.decoderConfig?.description,
                });
            }

            const buffer = new ArrayBuffer(chunk.byteLength);
            chunk.copyTo(buffer);
            f.addSample(track, buffer, {
                duration: frameDuration,
            });
        },
        error: (error) => {
            throw error;
        }
    });
    encoder.configure({
        codec: "avc1.64003d",
        width,
        height,
    });

    for (let i = 0; i < frames; i++) {
        const frame = new VideoFrame(
            renderFrame(i),
            {timestamp: i * frameDuration},
        );
        encoder.encode(frame);
        frame.close();
    }
    await encoder.flush();
    encoder.close();
    return f;
}

And here is how it is used. We simply create an OffscreenCanvas and provide a function that can draw a given frame.

const renderExampleVideo = async () => {
    const width = 600;
    const height = 600;
    const canvas = new OffscreenCanvas(width, height);

    const file = await encodeFramesToMP4({
        width,
        height,
        fps: 30,
        frames: 30, // duration in frames
        renderFrame: i => {
            // ...
            // draw frame #i on the canvas
            // ...
            return canvas
        }
    })
    file.save("Example.mp4");
}

Feel free to let me know if there is any issue in my code or anything that could be improved.

dalecurtis commented 1 year ago

Thanks for sharing! Codec strings can be pretty annoying. Here are some good references if you haven't seen them: https://developer.mozilla.org/en-US/docs/Web/Media/Formats/codecs_parameter https://cconcolato.github.io/media-mime-support/

KevinBoeing commented 1 year ago

I am currently working on a video editor that runs entirely in the Chrome browser. I am currently struggling with the demuxing and muxing process. Mp4Box.js works great, but unfortunately it only allows .mp4 containers to be demuxed. Is there already a demuxer that allows you to demux any container type? I was thinking of ffmpeg.wasm (I saw clipchamp uses it too), but since it's a cli tool, I have no idea how to use it as an all-in-one demuxer in javascript. Is it even possible? The end result of the demuxing process should be to have all the EncodedVideoFrames in one array.

dalecurtis commented 1 year ago

For all containers you'd definitely need something like ffmpeg.wasm https://github.com/w3c/webcodecs/pull/549 shows how this might work. Even if browsers had a containers API, it'd likely only support the formats they already parse (mp4, webm, ogg, etc)

KevinBoeing commented 1 year ago

Thats exactly what I needed. Thanks!

bartadaniel commented 1 year ago

I also work on a fully in-browser video editing experience. While I made it work with ffmpeg+webcodecs, I can see massive value in adding something like the WebContainers API. Maybe my angle is a little different here; I had edge cases where I had to do workarounds and fixups on the demuxed streams before I could feed them to the VideoDecoder. Some examples:

Obviously, these things are out of the scope of WebCodecs. But I assume all these touches are already written somewhere in the major browsers because those problematic videos play just fine in Chrome. If I could have access to the video and audio stream in a way that the browser thinks it's appropriate for decoding, that would be a game-changer.

dalecurtis commented 1 year ago

Maybe unsurprisingly, edge cases are one of the reasons we wouldn't want to do this. The argument being that containers are all edge cases and an external library meets needs the best. It's likely if we did undertake this the API would be limited to very common muxing and demuxing scenarios. E.g., playback w/ seeking and basic recording scenarios.

IIRC, decoded timestamps should just pass through per spec: https://w3c.github.io/webcodecs/#output-videoframes -- If you're not seeing that, please file an issue with the respective UA. https://crbug.com/new for Chromium.

aboba commented 4 months ago

Can we close this issue?

padenot commented 4 months ago

I think so.

chrisn commented 4 months ago

I think so too, but worth keeping track of developer interest, so have created https://github.com/w3c/media-and-entertainment/issues/108 - anyone still interested is welcome to comment there.

ForeverSc commented 3 weeks ago

Recently I designed a WASM demuxer package specifically for WebCodecs, compared to ffmpeg.wasm, the size will be much smaller, and at the same time than mp4box.js support more formats such as like mkv, webm, flv, etc., I hope it can help people who need it! https://github.com/ForeverSc/web-demuxer