w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
939 stars 132 forks source link

After how many decode should the codec process the frames? #753

Closed tobiasBora closed 7 months ago

tobiasBora commented 7 months ago

I am having some issues understanding the proper way to decode frames to play a video at normal speed without caching all decoded frames (memory explosion):

  1. If I just use decode, then the frames are sometimes processed later. For instance, in my experience in Chromium, the first frame is never processed and I need to send at least 2 frames to start the processing.
  2. On the other hand, if I use flush, then, not only I need to be sure to restart on a key frame (that is weird I think) which means that I must decode around 250 frames to reach the next key frame + store them in memory, but, more importantly, this creates a really significant slow down. In my experimentation, just removing .flush made a really choppy output waaaay more fluid.

So it seems like the only solution to have proper efficient decoding is to use only .decode and put "enough data" to be sure that the frames are processed… but this "enough" is not specified in the spec (my understanding of the spec is that enough = 1, but from my experience, at least Chromium does not follow this as already mentionned).

padenot commented 7 months ago

This repo is the home of the Web Codecs specification, and it is usually preferred that question go on Stack Overflow or other forum dedicated to questions and more easily searchable by others. That said, this is also searchable by others, so I'll answer here anyway.

This can generally depend on three things (maybe more?):

All that to say: there is no general answer.

Flushing needing a key frame to restart is fairly normal, I'm not entirely sure why you say it's weird. flush() is to be called at the end of the video, or when seeking, not during general playback.

To have proper efficient decoding, you send input as much as you can, and you wait to receive the first output, in which case you queue an input packet again. This generally allows saturating the underlying decoder implementation.

If you can't send input anymore, decodeQueueSize starts growing, you can wait for a "dequeue" event to be fired, this is the internal codec implementation telling you that it has more slots in its queue to produce more frames.

We (the specification editors, helped by other contributors) have written various sample apps using Web Codes, with various codecs and scenario: https://github.com/w3c/webcodecs/tree/main/samples is the source, hopefully clear and commented enough (let us know if not!), deployed at https://webcodecs-samples.netlify.app/ (not on gh pages because we need a couple headers to be set for SharedArrayBuffer). As far as I'm aware, the samples work on all browsers implementing Web Codecs on all platforms, as much as possible (codec implementation / feature implementation is sometimes incomplete and will be more complete in the next few months -- they certainly work in Chromium and Safari and most of them work in Firefox with our work in progress patches).

Also, most of the above applies to video decoding generally and not only Web Codecs, it would be the same with e.g. ffmpeg/VideoToolbox/MediaCodec/wmf/pick your media framework.

tobiasBora commented 7 months ago

Thanks a lot for your detailed answer, I was not aware that frame might need frames in the future, this might explain my issue indeed. But knowing that Windows has an even worse delay is a bit scary, I was thinking that the interface would be more uniform, abstracted by the browser. Is there a safe number of frames to decode in advance? I was queuing 250 frames (between 2 key frames), but I guess it is too much? (actually I want to be able to play backward, that's why I need this) But when I read the examples you gave, seems like they use 3 ^^

flush() is to be called at the end of the video, or when seeking, not during general playback.

Oh, that's good to know, it was not obvious from the docs I read (mostly mozilla). I was seeing it as a simple "wait until the frame is received", good to learn it is not. I was thinking it is weird to require a sync frame after (if you can decode, why do can't you restart from the last frame?), but I guess it is to be sure that people do not flush frames that need a frame in the future to be decoded.

We […] have written various sample apps using Web Codecs

Oh, last time I checked I could find this one but it only plays as fast as possible, not in real time… but https://webcodecs-samples.netlify.app/audio-video-player/audio_video_player.html is exactly what I needed, it will be really useful. Thanks a lot, I have some stuff to study now!

tobiasBora commented 7 months ago

Actually, I have a question about:

    while (this.frameBuffer.length < FRAME_BUFFER_TARGET_SIZE &&
            this.decoder.decodeQueueSize < FRAME_BUFFER_TARGET_SIZE) {
      let chunk = await this.demuxer.getNextChunk();
      this.decoder.decode(chunk);
    }

My understanding is that this tries to saturate the decoder by sending decode messages until FRAME_BUFFER_TARGET_SIZE frames are decoded AND the queue has size FRAME_BUFFER_TARGET_SIZE. But what happens if the decoder is like really fast and does not saturate? My understanding is that it will create a huge list this.frameBuffer, but decodeQueueSize will stay around 0 or 1… which might end up in a memory crash. Is it just that the decoder is never that fast so it is not a problem in practice?

sandersdan commented 7 months ago

The loop will exit if either either FRAME_BUFFER_TARGET_SIZE outputs are ready or FRAME_BUFFER_TARGET_SIZE inputs are pending.

tobiasBora commented 7 months ago

Arg, stupid me, thanks, time to sleep.

tobiasBora commented 7 months ago

Oh, but now I don't understand anymore why it is supposed to work if windows needs at least 10 decoded messages to start, as this value is hardcoded to FRAME_BUFFER_TARGET_SIZE = 3;. So if the decoder saturates directly, it will have 3 messages, so not enough to output a first frame no?

sandersdan commented 7 months ago

The decoder will consume inputs, decreasing the decodeQueueSize, even while it does not produce output.

The exception is when the number of outstanding (non-closed) outputs exceeds the decoder's limit, in which case decoding will stall. This limit varies isn't known in general.

tobiasBora commented 7 months ago

Oh, so the sum of inputs and decodeQueueSize is not invariant… interesting. So this is made possible thanks to setTimeout(this.fillFrameBuffer.bind(this), 0); that will basically loop, and add stuff to the queue if things have been consumed without producing inputs. Interesting, thanks!

sandersdan commented 7 months ago

See also the dequeue event.

tobiasBora commented 6 months ago

Thanks! As a note, sometimes the decoder just stays blocked forever: in that case the solution is to add:

const wait = (n) => new Promise((resolve) => setTimeout(resolve, n));
…
await wait(0); 

right before the code to the decoder (the above example uses instead setTimeout(f,0)).