Closed sivi299 closed 3 years ago
Hey, these are normalized coordinates. Please times the input image size to scale them back.
Thanks for the quick reply, I figured it out after having a detailed run of the code.
May I know which dataset you used to achieve the accuracy as per the sample video?
I used Helen dataset and the results are nowhere close to the accuracy achieved by your sample video
You're welcome!
Actually you can run the demo yourself with this repo: https://github.com/yinguobing/head-pose-estimation
Remember to checkout branch tf1
which contains the original model file I used for the gif. The latest branch master
had been updated to TensorFlow 2 and the new model was retrained and may not have similar performance as the old one.
For the old model training I used a mixed dataset from 300-W, LFPW, HELEN, AFW, IBUG and 300-VW, totaling around 250k samples. That is far more than Helen. You can find out more details here: https://yinguobing.com/facial-landmark-localization-by-deep-learning-data-cleansing/
Still, I recommend using branch master
of this repo. The model is identical to the old one except the batch normalization layers.
Thanks again.. I will give it a shot
Hi yinguobing,
Great work. I am a newbie to machine learning.
Upon prediction I get points in the following format
[0.00292316 0.26477575 0.10086177 0.43508154 0.1221348 0.45653793 0.12614055 0.6041809 0.212104 0.6998434 0.33655456 0.79500437 0.32305548 0.7458393 0.36422956 0.8949114 0.49114522 0.842641 0.58332276 0.8635016 0.65928763 0.8322915 0.70472896 0.7359222 0.795217 0.67806697 0.83497554 0.60467535 0.8906438 0.5219838 0.8792334 0.43395418 0.9463943 0.33878005 0.23912925 0.30637467 0.29551536 0.28067684 0.4044374 0.3247212 0.5158232 0.2979741 0.5828098 0.36473733 0.6673418 0.35720032 0.68439937 0.24011317 0.75540173 0.22922932 0.8243399 0.22251658 0.8556332 0.23645471 0.5472086 0.34398696 0.58606166 0.34416178 0.57206875 0.4353188 0.59197545 0.49749726 0.48247778 0.5297549 0.543103 0.5379993 0.5641834 0.5800524 0.5942605 0.53381634 0.62464863 0.58011717 0.3136923 0.21859199 0.33870625 0.32672268 0.3694461 0.3002806 0.47706386 0.36430454 0.40921107 0.30535015 0.3357002 0.3789393 0.64669144 0.3415216 0.7422384 0.3047884 0.77709746 0.28823054 0.803785 0.2578987 0.8123512 0.33126527 0.74628556 0.33401793 0.37077686 0.61136085 0.43704486 0.6332591 0.48559546 0.6475471 0.5531327 0.6783789 0.6229963 0.5619774 0.669219 0.64011866 0.6812284 0.6504894 0.6177426 0.745097 0.5394757 0.72451556 0.4724243 0.7400019 0.41907567 0.7237075 0.38598323 0.73804015 0.37215307 0.5913325 0.49952555 0.6848218 0.5284845 0.6660045 0.61558473 0.6440302 0.6539079 0.6177788 0.5709134 0.74506575 0.47966427 0.6821654 0.45565534 0.817149 ]
which i can restructure to [[x1,y1]....[xn,yn]] format to make cv2.circle. But this prediction needs to be formatted to scale to the image so that points are on desired parts of face.
What logic should i apply on the prediction to get the right values?
Thanks in advance