w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 14 forks source link

Face Detection: Scope of Applicability #84

Open aboba opened 1 year ago

aboba commented 1 year ago

Currently the API in PR 78 proposes to provide hw acceleration for Face Detection based on camera driver support. Tying support for accelerated Face Detection to support in a camera driver seems unlikely to provide wide coverage, since it is likely to only be supported on new camera models. Acceleration using more commonly available hardware (such as GPUs) will be likely to have wider coverage, which is why ML development tools such as tensorflow.js utilize this technique. It is also why GPU acceleration is mentioned in WebRTC-NV Uses Cases Section 3.6.

Face Detection APIs that do not achieve wide coverage will be frustrating for applications that do not wish to develop their own face detection models. These applications will either need to operate without face detection support if it is not available, or they will need to include their own face detection capabilities - which will lessen the need for the proposed APIs.

Those applications that include their own face detection models will also probably choose to forgo the proposed APIs, choosing instead to leverage GPU-based acceleration approaches supported by Web ML platforms such as Tensorflow.js.

References: https://lists.w3.org/Archives/Public/public-webrtc/2023Jan/0047.html

steely-glint commented 1 year ago

That seems to me to be a slightly defeatist attitude - i.e. unless all devices can do this well we shouldn't allow the ones that can to expose the hardware capability.

We've already seen instances of background blur being done on a server because the GPU isn't always capable of the required webML in realtime.

I suspect that users who find that common tensor flow models don't reliably detect their faces would welcome the ability to buy a camera that does - I can imagine communities adopting certain webcams precisely because they work well for (say) grey-beards.

youennf commented 1 year ago

it is likely to only be supported on new camera models

AFAIK, iOS devices have cameras supporting this for quite some time. Coverage on this particular platform should be pretty good today.

Those applications that include their own face detection models will also probably choose to forgo the proposed APIs, choosing instead to leverage GPU-based acceleration approaches supported by Web ML platforms such as Tensorflow.js.

I do not think one approach excludes the other. I could for instance see camera driver face detection input be refined by WebML models as a perf optimization.

In general, I would tend to think that if native applications have a use for some native APIs, web applications will probably have a use case for similar web APIs. This principle seems to apply well here.

ttoivone commented 1 year ago

Face detection API has been available for Android phone vendors since API level 14 (Android 4.0, 2011) and in 2015, 54% or 1.3 billion devices shipped were based on Android (Wikipedia). I don't have hard facts how large percentage of these devices actually implement the face detection API, but I would assume currently virtually all as it is usually implemented as part of the camera control algorithms (auto exposure, focus, white balance) to improve image quality. For example, my Motorola Droid 4 from 2012 and Huawei Honor 7 from 2015 both support it. If someone here knows an Android phone which does not support it, please let us know.

On ChromeOS, the Android Camera2 API, which supports face detection, has been available on all Pixelbooks from Google (from 2017 onwards). We actually originally implemented the Chromium face detection support on Pixelbook Go (shipped 2019).

On Windows 10 and above, clients with driver support have supported a face detection API. And at least in latest Windows versions, if the driver doesn't provide face detection, Windows uses Windows.Media.FaceAnalysis to implement the face detection, so missing driver/camera support shouldn't be a showstopper. We agree that the percentage of Windows clients on the market who can take advantage of this right now is low, but soon enough people should have updated Windows to a more recent version with face detection support.

As youennf mentioned, iOS devices have also supported this for some time, although we don't have first-hand experience of that platform.

aboba commented 1 year ago

@steely-glint The goal of existing W3C ML APIs is to allow the same model to run on any browser or hardware, albeit slower if acceleration is not available. In order to ensure the widest range of applicability, existing API proposals rely on GPU acceleration, which is also what ML frameworks use for acceleration and is the approach identified in the WebRTC-NV Use Cases document. Depending on camera hardware will limit applicability compared with the GPU acceleration approach.

One way to address the applicability problem would be to find a way to support failover. Other APIs such as Media Capabiltiies make it possible for applications to understand performance characteristics under various conditions.

ttoivone commented 1 year ago

@aboba

Depending on camera hardware will limit applicability compared with the GPU acceleration approach.

We do not depend on camera hardware on the proposal. In fact, no MIPI-based camera has face detection built-in. On systems with a MIPI camera, the image is processed by a chip typically on the SoC. For instance, Intel has had its Image Processing Unit (IPU) with face detection part of selected SoCs since around 2012. Basically all mobile phones have a MIPI camera and thus FD on SoC. Several newer laptops also have a MIPI camera, although this is much rarer, but as mentioned, even with USB camera without FD support, recent Windows versions have a failover support.

One way to address the applicability problem would be to find a way to support failover. Other APIs such as Media Capabiltiies make it possible for applications to understand performance characteristics under various conditions.

It is true that our API proposal does not give access to face detection performance characteristics. That is something which might make sense to add. However, I suspect that platform APIs themselves provide little information on performance. One simple way would be to not expose the FD API in cases where the platform implementation is known to be relatively slow (compared to eg. W3C ML APIs), or of low quality.

dontcallmedom-bot commented 1 year ago

This issue was mentioned in WEBRTCWG-2023-02-21 (Page 43)