pntt3011 / mediapipe_face_iris_cpp

Real-time Face and Iris Landmarks Detection using C++
GNU General Public License v3.0
81 stars 15 forks source link

关键点准确度的问题 #9

Open LebronJames0423 opened 2 years ago

LebronJames0423 commented 2 years ago

请问下,当在640480的分辨率时,关键点的位置看起来比较准确,当我设置成1280720时,关键点检测的位置就不太准确了,特别是脸部轮廓的位置,会超出脸部的范围,请问这个是什么原因呢?

pntt3011 commented 2 years ago

Hello, I try capturing my webcam, resizing the frames to 1280 x 720 and loading them to the model. The results still look accurate for me.

However, I think there is a possibility. The frames are resized to 128x128 in the preprocess because it's the shape of the model input, which has ratio 1:1. 640x480 has ratio 4:3 and 1280x720 has ratio 16:9 so when resized, 16:9 will "shrink" more than 4:3. Moreover, my ssd anchors are generated with fixed ratio 1:1 (you can try replacing generateAnchors with the generate_anchors here) so when mapping the detection results to the original size, it may give a rectangle roi instead of a square one. (Note that in calculateRoiFromDetection, I multiply the height by 2 and width by 1.5 to "hack" this issue, you can try adjusting those numbers too). Then the face roi is passed to key point detection and if the roi is not "square enough", this can lead to inaccurate results (the key point model performs better for inputs with ratio 1:1).

(I use Google Translate to translate your issue. If I misunderstand your question please let me know in English).

LebronJames0423 commented 2 years ago

Hello, I try capturing my webcam, resizing the frames to 1280 x 720 and loading them to the model. The results still look accurate for me.

However, I think there is a possibility. The frames are resized to 128x128 in the preprocess because it's the shape of the model input, which has ratio 1:1. 640x480 has ratio 4:3 and 1280x720 has ratio 16:9 so when resized, 16:9 will "shrink" more than 4:3. Moreover, my ssd anchors are generated with fixed ratio 1:1 (you can try replacing generateAnchors with the generate_anchors here) so when mapping the detection results to the original size, it may give a rectangle roi instead of a square one. (Note that in calculateRoiFromDetection, I multiply the height by 2 and width by 1.5 to "hack" this issue, you can try adjusting those numbers too). Then the face roi is passed to key point detection and if the roi is not "square enough", this can lead to inaccurate results (the key point model performs better for inputs with ratio 1:1).

(I use Google Translate to translate your issue. If I misunderstand your question please let me know in English). Thank you for your advice ! You are right, it can be resolved by setting the "w" and "h" to "detection.roi.width origWidth 1.f" and "detection.roi.height origHeight 2.f;" in function calculateRoiFromDetection. And another problem i find that there is always a small amount of jitter on the keypoints, even though they seem small. However, when it is used in other applications, the effect will not be very good, such as using it in the big-eye and face-lifting function of the beauty camera. How can i solve it?

pntt3011 commented 2 years ago

Hello, according to this issue, it is Mediapipe's problem. Some workarounds (included Mediapipe's) are mentioned in that thread. I'll try to implement one when I have free time.