Open aboba opened 1 year ago
Whilst at an ML level Face Detection is 'just another segmentation problem', from the user's point of view it is somewhat more personal than detecting an orange - especially since the bulk use-case of WebRTC is video conferences and good background blur is an egalitarian feature. I think that some exceptionalism for this use case is justified.
@aboba, can you clarify whether this issue is a blocker for the CFC? AIUI, your suggestion seems like a request for API change, not a blocker for the API.
It's a request for a metadata change, so that we don't have to define metadata for segmentation in addition to metadata specific to face detection. If the encoder wants to utilize segmentation information to figure out where to spend its effort, it shouldn't have to be able to understand multiple metadata formats, each optimized for a particular use.
@aboba I agree that it would be better to define a generic segmentation metadata. We're happy to change the spec proposal once agreed on directions. What do you think of this:
partial dictionary VideoFrameMetadata {
sequence<Segment> segment;
};
dictionary Segment {
DOMString type; // One of enum SegmentType
long id;
long partOf; // References the parent segment id
float probability; // or confidence
Point2D? centerPoint;
DOMRectReadOnly? boundingBox;
// sequence<Point2D>? contour; // Possible future extension
};
enum SegmentType {
"human-face",
"left-eye",
"right-eye",
"mouth",
// To be extended later with other types of segments
};
This issue was mentioned in WEBRTCWG-2023-02-21 (Page 44)
FaceDetection metadata is one example of VideoFrame segmentation, which is useful for:
It occurs to me that rather than defining FaceDetection metadata, we might instead define Segmentation metadata, with a type field of "face detection".