Open youennf opened 1 year ago
Thanks for filing. In many cases this is a function of GPU model or CPU memory which can at least be determined by other means. There's definitely some cases that have fixed limits though. E.g., TV use cases may only have one output frame available depending on resolution.
I suspect these limits correlate with the number of overall decoders that can be created. There's already a note in the privacy section for that, so we should at least add a similar one for frame counts.
In terms of workarounds I can think of:
- Copy-at-limit; E.g., silently replace vended frames with copies. Still reveals OOM limits.
I was thinking of a countermeasure like that. OOM limit is fine I think. If stalling proves to be useful to web developers (memory leak detection for instance), we could keep a sufficiently high hard limit above which the application has to manually copy VideoFrames. Or this could be a decoder parameter given by the web application.
Isn't this observable? The time to copy a frame is certainly measurable, especially if the frames are big (and they often are).
Yes that's probably true; I'm not sure how reliable detection would be, but seems feasible at least.
About a year ago, I did some measurements of copy duration on some reasonnably standard video frames, and here's what I had on a very powerful x86_64 / DDR4 box running Linux:
Hot caches (meaning, the video frame was just decoded, probably in the decode()
promise thenable)
Cold caches (meaning, the video frame was decoded some time ago, and we're now copying it)
the number are really consistent accross runs (the values above are already averaged accross a bunch of copies).
On an Apple Silicon machine (M1 Max), it's widely different but still a non-trivial amount of time.
Hot caches:
Cold caches:
again, repeating the benchmark, it's quite consistent accross runs, a bit less consistent than the x86_64 / DDR4 case.
This is for frames resident in memory (regular memory copies), it would be nice to have number for readbacks.
Web pages may be able to compute a video decoder buffer pool size by not releasing VideoFrame of a given decoder and feeding the decoder data to decode until it stalls. As such, this can be a fingerprinting vector. The same issue applies to Media Capture Transform for peer connection tracks (and camera tracks as well though camera access is gated by a permission).