Open riju opened 3 years ago
Seems worth moving to https://github.com/w3c/mediacapture-extensions
This was presented and discussed during a TPAC 2021 breakout, and further discussed during the Nov 2021 WebRTC meeting.
From the latter, feedback included:
VideoFrame
rather than at the MediaStreamTrack
levelA few thoughts from the past meeting:
As of contour information vs. simpler rectangle information, I'd like to understand what drivers currently generate (my guess is a set of rectangles) and what they might produce in the future (contours maybe?). Starting simple with a set of rectangles does not seem to bad to me provided it is what drivers currently generate (and will probably generate for some time) and it suits reasonably well the processing what would make use of such data.
Why ?
Face Detection on Video Conferencing. Support WebRTC-NV use cases like Funny Hats, etc On client side, developers have to use Computer Vision libraries (OpenCV.js / TensorFlow.js) either with a WASM (SIMD+Threads) or a GPU backend for acceptable performance. Many developers would resort to cloud based solutions like Face API from Azure Cognitive Services or Face Detection from Google Cloud's Vision API. On modern client platforms, we can save a lot of data movement and even on-device computation by leveraging the work the camera stack / Image Processing Unit (IPU) anyways does to improve image quality, for free.
What ?
Prior Work WICG has proposed the Shape detection API which enables Web applications to use a system-provided face detector, but the API requires that the image data be provided by the Web application itself. To use the API, the application would first need to capture frames from a camera and then give the data to the Shape detection API. This may not only cause extraneous computation and copies of the frame data, but may outright prevent using the camera-dedicated hardware or system libraries for face detection. Often the camera stack performs face detection in any case to improve image quality (like 3A algorithms) and the face detection results could be made available to applications without extra computation.
Many platforms offer a camera API which can perform face detection directly on image frames from the system camera. The face detection can be assisted by the hardware which may not allow applying the functionality to user-provided image data or the API may prevent that.
STATISTICS_FACE_DETECT_MODE
STATISTICS_FACE_DETECT_MODE_FULL
STATISTICS_FACE_DETECT_MODE_SIMPLE
In Android, the resulting face statistics is parsed and stored into class Face.
Windows Face detection is performed in DeviceMFT on the preview frame buffers. The DeviceMFT integrates the face detection library, and turns on features, when requested by application. Face detection is enabled with property ID KSPROPERTY_CAMERACONTROL_EXTENDED_FACEDETECTION. When enabled, the face detection results are returned using metadata attribute MF_CAPTURE_METADATA_FACEROIS which contains, for each face, the face coordinates:
The API also supports blink and smile detection which can be enabled with property IDs
KSCAMERA_EXTENDEDPROP_FACEDETECTION_BLINK
andKSCAMERA_EXTENDEDPROP_FACEDETECTION_SMILE
.macOS Apple offers face detection using Core Image CIDetectorTypeFace or Vision VNDetectFaceRectanglesRequest.
How ?
Strawman proposal