yinguobing / cnn-facial-landmark

Training code for facial landmark detection based on deep convolutional neural network.
MIT License
623 stars 183 forks source link

Prediction #106

Closed sivi299 closed 3 years ago

sivi299 commented 3 years ago

Hi yinguobing,

Great work. I am a newbie to machine learning.

Upon prediction I get points in the following format

[0.00292316 0.26477575 0.10086177 0.43508154 0.1221348 0.45653793 0.12614055 0.6041809 0.212104 0.6998434 0.33655456 0.79500437 0.32305548 0.7458393 0.36422956 0.8949114 0.49114522 0.842641 0.58332276 0.8635016 0.65928763 0.8322915 0.70472896 0.7359222 0.795217 0.67806697 0.83497554 0.60467535 0.8906438 0.5219838 0.8792334 0.43395418 0.9463943 0.33878005 0.23912925 0.30637467 0.29551536 0.28067684 0.4044374 0.3247212 0.5158232 0.2979741 0.5828098 0.36473733 0.6673418 0.35720032 0.68439937 0.24011317 0.75540173 0.22922932 0.8243399 0.22251658 0.8556332 0.23645471 0.5472086 0.34398696 0.58606166 0.34416178 0.57206875 0.4353188 0.59197545 0.49749726 0.48247778 0.5297549 0.543103 0.5379993 0.5641834 0.5800524 0.5942605 0.53381634 0.62464863 0.58011717 0.3136923 0.21859199 0.33870625 0.32672268 0.3694461 0.3002806 0.47706386 0.36430454 0.40921107 0.30535015 0.3357002 0.3789393 0.64669144 0.3415216 0.7422384 0.3047884 0.77709746 0.28823054 0.803785 0.2578987 0.8123512 0.33126527 0.74628556 0.33401793 0.37077686 0.61136085 0.43704486 0.6332591 0.48559546 0.6475471 0.5531327 0.6783789 0.6229963 0.5619774 0.669219 0.64011866 0.6812284 0.6504894 0.6177426 0.745097 0.5394757 0.72451556 0.4724243 0.7400019 0.41907567 0.7237075 0.38598323 0.73804015 0.37215307 0.5913325 0.49952555 0.6848218 0.5284845 0.6660045 0.61558473 0.6440302 0.6539079 0.6177788 0.5709134 0.74506575 0.47966427 0.6821654 0.45565534 0.817149 ]

which i can restructure to [[x1,y1]....[xn,yn]] format to make cv2.circle. But this prediction needs to be formatted to scale to the image so that points are on desired parts of face.

What logic should i apply on the prediction to get the right values?

Thanks in advance

yinguobing commented 3 years ago

Hey, these are normalized coordinates. Please times the input image size to scale them back.

sivi299 commented 3 years ago

Thanks for the quick reply, I figured it out after having a detailed run of the code.

May I know which dataset you used to achieve the accuracy as per the sample video?

I used Helen dataset and the results are nowhere close to the accuracy achieved by your sample video

yinguobing commented 3 years ago

You're welcome!

Actually you can run the demo yourself with this repo: https://github.com/yinguobing/head-pose-estimation

Remember to checkout branch tf1 which contains the original model file I used for the gif. The latest branch master had been updated to TensorFlow 2 and the new model was retrained and may not have similar performance as the old one.

For the old model training I used a mixed dataset from 300-W, LFPW, HELEN, AFW, IBUG and 300-VW, totaling around 250k samples. That is far more than Helen. You can find out more details here: https://yinguobing.com/facial-landmark-localization-by-deep-learning-data-cleansing/

Still, I recommend using branch master of this repo. The model is identical to the old one except the batch normalization layers.

sivi299 commented 3 years ago

Thanks again.. I will give it a shot