w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 15 forks source link

Face Detection: Variance of Results #85

Open aboba opened 1 year ago

aboba commented 1 year ago

Since the supported Face Detection models may vary by camera, the API proposed in PR 89 can potentially give varying results depending on the hardware. This will impose a support burden on applications, which could need to maintain maintain a camera blacklist.

Such a list would be difficult to develop without the ability to identify the camera hardware, which in turn could be considered a fingerprinting risk.

These issues do not arise for applications utilizing an existing face detection model written for an ML platform, since those models will yield the same results, albeit with better or worse performance depending on the (GPU) hardware. The variance of results therefore represents a dis-incentive to use of the proposed APIs.

youennf commented 1 year ago

This will impose a support burden on applications, which could need to maintain maintain a camera blacklist.

My understanding is that UAs are responsible for ensuring the results are good enough. This seems no different than other existing APIs such as echo cancellation or HW encoders.

Such a list would be difficult to develop without the ability to identify the camera hardware, which in turn could be considered a fingerprinting risk.

This is gated by camera permission. Video frames are already exposed to the web application. I am unclear which additional fingerprinting information this API would actually provide.

These issues do not arise for applications utilizing an existing face detection model written for an ML platform, since those models will yield the same results, albeit with better or worse performance depending on the (GPU) hardware.

This is true of many existing features, I do not see what is special here. Echo cancellation for instance can be done by the OS, the UA or the web application. All 3 options are yielding different results and different performances, which gives a healthy playground for developers.

aboba commented 1 year ago

The issue is that ML models are under a lot of scrutiny so that an API that will yield different results depending on the hardware is a problem. Today models require a lot of validation before they can be deployed; they are increasingly being treated like drugs, with a multi-stage process involving review panels and large-scale testing. All existing W3C APIs for ML model acceleration enable the same model to be deployed everywhere, albeit running slower or faster depending on the hardware. This API proposal does not provide the level of uniformity of existing approaches, nor does it compare favorably on coverage.

ttoivone commented 1 year ago

If web developers need to have exactly uniform results everywhere, then they are free to deploy a ML model of their choice using Web ML APIs. However, we believe that there are numerous applications where identical results are not required as long as they have reasonable quality (and user agents can filter out implementations with unreasonably low quality).

It also works in the opposite way: when system face detection models improve, existing applications will get the benefit if using the proposed API unlike if they would deploy their own model.

As youennf mentioned, there are many existing W3C APIs where the quality varies depending on platform yet they have been found important enough to be supported: for example the Shape Detection API and the WebCodecs VideoEncoder and AudioEncoder interfaces. Here's a table showing how widely the quality of H.264 video encoders can vary yet any could be used for implementing the WebCodecs VideoEncoder API.

Also important aspect is the performance. In many cases the FD using the proposed API is free or near-free in computation (camera algorithms often run internally FD whether user wants the results or not) and many users might opt to using the API even if the results might vary to some degree.

And last, even if an app would decide to deploy its own ML model, it could still make use of the metadata definitions from this proposal.

dontcallmedom-bot commented 1 year ago

This issue was mentioned in WEBRTCWG-2023-02-21 (Page 45)