w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
975 stars 136 forks source link

VideoDecoder API may expose the underlying buffer pool size #569

Open youennf opened 1 year ago

youennf commented 1 year ago

Web pages may be able to compute a video decoder buffer pool size by not releasing VideoFrame of a given decoder and feeding the decoder data to decode until it stalls. As such, this can be a fingerprinting vector. The same issue applies to Media Capture Transform for peer connection tracks (and camera tracks as well though camera access is gated by a permission).

dalecurtis commented 1 year ago

Thanks for filing. In many cases this is a function of GPU model or CPU memory which can at least be determined by other means. There's definitely some cases that have fixed limits though. E.g., TV use cases may only have one output frame available depending on resolution.

I suspect these limits correlate with the number of overall decoders that can be created. There's already a note in the privacy section for that, so we should at least add a similar one for frame counts.

In terms of workarounds I can think of:

youennf commented 1 year ago
  • Copy-at-limit; E.g., silently replace vended frames with copies. Still reveals OOM limits.

I was thinking of a countermeasure like that. OOM limit is fine I think. If stalling proves to be useful to web developers (memory leak detection for instance), we could keep a sufficiently high hard limit above which the application has to manually copy VideoFrames. Or this could be a decoder parameter given by the web application.

padenot commented 1 year ago

Isn't this observable? The time to copy a frame is certainly measurable, especially if the frames are big (and they often are).

dalecurtis commented 1 year ago

Yes that's probably true; I'm not sure how reliable detection would be, but seems feasible at least.

padenot commented 1 year ago

About a year ago, I did some measurements of copy duration on some reasonnably standard video frames, and here's what I had on a very powerful x86_64 / DDR4 box running Linux:

Hot caches (meaning, the video frame was just decoded, probably in the decode() promise thenable)

Cold caches (meaning, the video frame was decoded some time ago, and we're now copying it)

the number are really consistent accross runs (the values above are already averaged accross a bunch of copies).

On an Apple Silicon machine (M1 Max), it's widely different but still a non-trivial amount of time.

Hot caches:

Cold caches:

again, repeating the benchmark, it's quite consistent accross runs, a bit less consistent than the x86_64 / DDR4 case.

This is for frames resident in memory (regular memory copies), it would be nice to have number for readbacks.