Feedback on image_face_ratio_filter and Suggestion for a New image_face_counter_filter Operator

TobyJasper commented 1 month ago

Search before continuing 先搜索，再继续

[X] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。

Description 描述

Hi there,

I'd like to provide some feedback on the image_face_ratio_filter operator and also propose a new feature.

image_face_ratio_filter Performance: I've been using this operator for image filtering, but I've noticed that the dlib-based face detection doesn't perform as well as expected, especially in terms of detection accuracy. Additionally, installing the CUDA-supported version of dlib is quite cumbersome on some systems, which could be a barrier for users who want to leverage GPU acceleration. Would you consider using a different model, such as MTCNN or a similar alternative, which might offer better performance and easier installation?
Suggestion for image_face_counter_filter: I'd also like to suggest adding a new operator, image_face_counter_filter, which would allow users to filter images based on the number of faces detected. This would be especially useful for tasks where only images with a specific number of faces (e.g., exactly one face, multiple faces) are needed. What are your thoughts on including this in the project? If needed, I would be happy to submit a PR to implement this feature.

Use case 使用场景

No response

Additional 额外信息

No response

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR？

[X] Yes I'd like to help by submitting a PR! 是的！我愿意提供帮助并提交一个PR！

drcege commented 1 month ago

Thank you for your feedback.

We are aware of the challenges involved in installing dlib, so we have switched the face detection implementation to OpenCV, completely eliminating the dependency on dlib. Both implementations aim to provide ease of use for quick setup; however, the accuracy may not reach state-of-the-art levels. We plan to support a series of state-of-the-art face detection models through the ModelScope interface, including the mentioned MTCNN, as well as RetinaFace, MogFace, and others, which we expect to implement in the coming weeks.
Your suggestion regarding the image_face_counter_filter is excellent. Please feel free to submit an implementation PR, and we can work together to enhance it.

TobyJasper commented 1 month ago

My apologies. I haven’t updated the code in almost a month and didn’t notice that dlib was removed.

I’ve submitted the PR for the new operator and tried to keep it in line with image_face_ratio_filter so it’s easier to update related features in the future.

modelscope / data-juicer