w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
975 stars 137 forks source link

Mid stream decode using vp09.00.10.08 #563

Closed sinclairzx81 closed 1 year ago

sinclairzx81 commented 1 year ago

Hi, I'm writing to get some advice on supporting mid stream decoding using the vp09.00.10.08 codec. This is partially related to issue https://github.com/w3c/webcodecs/issues/220 where receiving a delta EncodedVideoFrame results in a Failed to execute 'decode' on 'VideoDecoder': A key frame is required after configure() or flush(). error. This error does make sense, but curious if there is a way to support mid stream decoding using this codec.

I've tried a couple of things, including caching the initial key frame. This mitigates the error above and was implemented with the expectation that the deltas might eventually resolve to a correct video frame after a enough deltas were received, but note this doesn't occur (resulting in garbled output).

Is it possible to support mid stream decode using this codec? Currently investigating ways to implement low latency HD video streaming using the Web Codecs API and fairly new to the vp9 codec in general. Any help, guidance or general directions to good documentation would be hugely appreciated!

Many Thanks! This API is fantastic.

dalecurtis commented 1 year ago

Glad you find the API useful!

You'll need to know where your key frames are. Chrome has opted to be strict here since hardware decoders are unpredictable in these situations. It's better for developers to find the error upfront versus hunting down a random errors in the field.

Low latency client/server implementations generally have the ability to dynamically generate key-frames. If you are connecting a client midstream, you will need some similar capability.

I'm not aware of any VP9 features that would allow it to catch up like you're thinking. @jzern in case there's something like that we should be supporting in WebCodecs.

aboba commented 1 year ago

The support for Scalable Video Coding can help a bit. For example, if you use scalability mode L1T3, then the client only needs the keyframe and subsequent base layer frames in order to begin decoding the stream. So if you can get these frames to the client, it can push them through the decoder to catch up and then begin decoding mid-stream.

sinclairzx81 commented 1 year ago

@dalecurtis @aboba Hi! Thank you for the follow up!

So have tried @aboba 's suggestion, and can confirm the L1T3 scalability mode on the VideoEncoder works great. The testing setup I currently have holds onto the initial key frame, then skips several dozen subsequent deltas, then continues decoding deltas there after, the video clears up within a few seconds which is perfect! Have also tested frame / packet loss and the results seem quite good around 5-10% missed frames.

The following is current encoder / decoder configurations I have in case these are helpful for others to reference.

VideoEncoder

const encoder = new VideoEncoder({
  output: (chunk, meta) => onOutput(chunk, meta),
  error: (error) => onError(error),
})
encoder.configure({ 
  codec: 'vp09.00.10.08',
  hardwareAcceleration: 'prefer-software',
  width: options.width,
  height: options.height,
  displayWidth: options.width,
  displayHeight: options.height,
  latencyMode: 'realtime',
  bitrateMode: 'variable',
  scalabilityMode: 'L1T3',
  framerate: 30,
  alpha: 'discard',
})

VideoDecoder

const decoder = new VideoDecoder({
  output: (frame) => onFrame(frame),
  error: (error) => onError(error),
})

decoder.configure({
  codec: 'vp09.00.10.08',
  hardwareAcceleration: 'prefer-software',
  codedWidth: options.width,
  codedHeight: options.height,
})

@aboba Thank you so much for the suggestion to use the L1T3 scalability mode! Both L1T3 and L1T2 seem to work quite well. I should read up more on these.

Hardware Acceleration

@dalecurtis Also, thanks for the heads up about the unpredictable behavior of hardware encoders, I was actually setting the prefer-hardware flag on the encoder, but since defaulted to prefer-software as running the L1T3 scalability mode in hardware seems to cause the encoder to unexpectedly close. I wasn't actually aware there were differences between how hardware and software would deal with encoding, so this insight is extremely helpful!

Happy to close this issue off. I do actually have a follow up question with respect to WebRTC and future plans to integrate the Web Codec API into WebRTC (I sense that the Web Codec API opens the doors to much more flexible media streaming architectures), but may submit as a different issue / discussion thread.

Thanks again for the insight! :D

jzern commented 1 year ago

The SVC suggestion is a good one. In a normal stream you would need to request a key frame or intra-refresh. With VP9 decoding can continue while waiting on that sync, but it assumes the encoder is regularly producing key frames or intra-only ones.

sinclairzx81 commented 1 year ago

@jzern Hi, are there be plans to implement a automatic syncing key frame specifically for the VP9 encoding? I am currently, holding onto the initial key frame, and that works pretty great (at least, I'm pretty happy with it currently as a POC), but there is quite a lot of catching up between that first key frame and deltas that could potential be representative of minutes (or hours) past that initial frame.

I guess in an ideal setup, the receiver would wait for a period key frame to be sent from the encoder before starting, and that the stream wouldn't need that initial catch up. Maybe there is an existing configuration for this? Or can this be specifically requested from the VideoEncoder?

jzern commented 1 year ago

A key frame can be forced with VPX_EFLAG_FORCE_KF. There also is some level of error resilience, which limits probability updates and frame dependencies. As mentioned, SVC also provides some level of resilience due to the layered structure.

sinclairzx81 commented 1 year ago

@jzern Thanks! Had just noticed this is likely expressed as encoder.encode(frame, { keyFrame:true }) on the JavaScript API.