VideoDecoder API may expose the underlying buffer pool size

youennf commented 1 year ago

Web pages may be able to compute a video decoder buffer pool size by not releasing VideoFrame of a given decoder and feeding the decoder data to decode until it stalls. As such, this can be a fingerprinting vector. The same issue applies to Media Capture Transform for peer connection tracks (and camera tracks as well though camera access is gated by a permission).

dalecurtis commented 1 year ago

Thanks for filing. In many cases this is a function of GPU model or CPU memory which can at least be determined by other means. There's definitely some cases that have fixed limits though. E.g., TV use cases may only have one output frame available depending on resolution.

I suspect these limits correlate with the number of overall decoders that can be created. There's already a note in the privacy section for that, so we should at least add a similar one for frame counts.

In terms of workarounds I can think of:

Don't allow zero copy (bad performance implications). Still reveals CPU memory limits. Won't help in OOM situation.
Copy-at-limit; E.g., silently replace vended frames with copies. Still reveals OOM limits.
Enforce limit below hard cap (Won't cover all cases, since limit is 1 sometimes). This could be interpreted as a 'privacy budget' type workaround as mentioned in privacy considerations.

youennf commented 1 year ago

Copy-at-limit; E.g., silently replace vended frames with copies. Still reveals OOM limits.

I was thinking of a countermeasure like that. OOM limit is fine I think. If stalling proves to be useful to web developers (memory leak detection for instance), we could keep a sufficiently high hard limit above which the application has to manually copy VideoFrames. Or this could be a decoder parameter given by the web application.

padenot commented 1 year ago

Isn't this observable? The time to copy a frame is certainly measurable, especially if the frames are big (and they often are).

dalecurtis commented 1 year ago

Yes that's probably true; I'm not sure how reliable detection would be, but seems feasible at least.

padenot commented 1 year ago

About a year ago, I did some measurements of copy duration on some reasonnably standard video frames, and here's what I had on a very powerful x86_64 / DDR4 box running Linux:

Hot caches (meaning, the video frame was just decoded, probably in the decode() promise thenable)

YUV420 1080p video frame SDR (4MB) ≈ 1.5ms
YUV420 4k video frame SDR (16MB) ≈ 6.6ms
P010 4k 10-bits video frame HDR (32MB) ≈ 15ms

Cold caches (meaning, the video frame was decoded some time ago, and we're now copying it)

YUV420 1080p video frame SDR (4MB) ≈ 4.5ms
YUV420 4k video frame SDR (16MB) ≈ 17ms
P010 4k 10-bits video frame HDR (32MB) ≈ 33ms

the number are really consistent accross runs (the values above are already averaged accross a bunch of copies).

On an Apple Silicon machine (M1 Max), it's widely different but still a non-trivial amount of time.

Hot caches:

YUV420 1080p video frame SDR (4MB) ≈ 0.35ms
YUV420 4k video frame SDR (16MB) ≈ 0.53ms
P010 4k 10-bits video frame HDR (32MB) ≈ 1.45ms

Cold caches:

YUV420 1080p video frame SDR (4MB) ≈ 1.24ms
YUV420 4k video frame SDR (16MB) ≈ 3.47ms
P010 4k 10-bits video frame HDR (32MB) ≈ 4.86ms

again, repeating the benchmark, it's quite consistent accross runs, a bit less consistent than the x86_64 / DDR4 case.

This is for frames resident in memory (regular memory copies), it would be nice to have number for readbacks.

w3c / webcodecs

VideoDecoder API may expose the underlying buffer pool size #569