w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
953 stars 135 forks source link

VideoDecoder: valid empty/skipped samples don't call output callback but don't call DecodeError either #623

Closed Knifa closed 1 year ago

Knifa commented 1 year ago

I have a handful of H264 videos that have what I think are valid empty/padding/skipped samples right at the end. With VideoDecoder I find that when it comes to decoding these samples, it seems that they are valid but don't call the output callback, but also don't trigger an error either --- what's the expectation here? I don't think the spec. calls out anything in particular wrt. skipped samples.

I guess I'd like something to happen because otherwise the only way to know that neither are going to be called in that case, is by checking byte size or trying to interp. raw samples directly before sending them to the decoder which feels a bit weird. Like the codec must have given some response internally, right? Maybe the callback could be called with a null VideoFrame?

Bit of a novice when it comes to video codecs, etc. --- definitely winging it here so sorry if I've not explained well!

Here's the videos in question (fairly chonky sorry, they've come straight out of the DVR from the DJI FPV Goggles): https://drive.google.com/file/d/1O4gFf8YYt9qWZiBmmg3cDO536pLMdzbn/view?usp=share_link (~200MB) https://drive.google.com/file/d/1B3hBKf7rI3G40nHwRsBQCdoWTBE0iob_/view?usp=share_link (~900MB)

Difficult to put together a small test case since they need demuxed but I can try if it's helpful? This is the meat of my processing pipeline if it's any use out of the bat. Don't think I'm doing anything particularly weird.

dalecurtis commented 1 year ago

Unless your encoding parameters are very precise you won't get 1-in-1-out behavior from most H.264 or H.265 decoders. You should always be calling flush() once you're done feeding a VideoDecoder to retrieve remaining frames.

Knifa commented 1 year ago

My thinking is that, not triggering any callback for these cases adds an awkward bit of complexity for API users where they need to keep track of expected versus skipped frames.

flush() needs to be used with care as the next frame needs to be a keyframe, right? So knowing if a frame is skipped is delayed until the next time you can call it. (And handling the keyframe vs. flush tracking is tricky as is! :smile:)

So the kind of general workflow for dealing with this currently then is something like:

  1. Pump in frames to VideoDecoder to be decoded with appropriate timestamps, up to just before the next keyframe.
  2. Simultaneously, keep track of which frames you expect to see later when the output callback hits.
  3. The decoder may start right away, and begin calling the output callback.
  4. In the output callback, note the frame that was actually decoded.
    • Since the process might rely on knowing if a frame was skipped (e.g., to repeat the last frame out to a coupled encoder) you can't perform any actions here reliably --- save the frame for later.
    • Since you can't literally hold onto the VideoFrame here or it'll block further decoding, you need to copy it out to a bitmap or otherwise. along with any properties from the frame you care about.
    • The output callback simply won't be called if it's one of these empty frames.
  5. Await on flush() and then compare frames decoded to frames expected.
  6. Do the things you wanted to with the decoded frames.
  7. Repeat, starting from the next keyframe.

Whereas if e.g. the output callback received a null frame, a lot of the tracking goes away and latency for dealing with decoded frames is reduced:

  1. Pump in frames to VideoDecoder up to next keyframe.
  2. In the output callback, the timestamp is always directly sequential but the actual frame might be null. You can take appropriate action right here in the callback to e.g. modify the frame and display it or repeat the previous frame to an encoder if needed.
  3. Await on flush() to ensure all frames get decoded.
  4. Repeat with the next keyframe.

What do you think? I guess it's just been surprising for me from a UX perspective that simply nothing happens when these empty frames are decoded!

dalecurtis commented 1 year ago

Ultimately we're at the whims of the hardware, so there's not much we can do here aside from maybe triggering a warning if a codec is closed w/o flushing. I'm not sure we can do that in a way that's not generating false positives though.

As noted if you really want this behavior you need to be very careful with how you encode the stream. https://bugs.chromium.org/p/chromium/issues/detail?id=1352442 has some discussion on that setup.

I think triggering output callbacks with null is likely to be more surprising to most folks. It would also break existing implementations.

Knifa commented 1 year ago

That's fair --- makes sense. And yeah, it's hard to get away from any changes breaking existing stuff either way... 😅

Thanks for your replies on this!