mpatacchiola / deepgaze

Computer Vision library for human-computer interaction. It implements Head Pose and Gaze Direction Estimation Using Convolutional Neural Networks, Skin Detection through Backprojection, Motion Detection and Tracking, Saliency Map.
MIT License
1.8k stars 478 forks source link

input image to the cnn_head_pose_estimation #26

Closed Zouhj closed 7 years ago

Zouhj commented 7 years ago

Hi, thank you for your works! I am not clear that what the images are exactly processed before we input to the nets? Is it just detected square face box or some translation and scaling have been down? I test the same face image with different size, one just square the face, another is bigger and with more background, the result yaw, pitch and roll are different. So it's important to know the pictures input to model.
wish for your replies.

mpatacchiola commented 7 years ago

Hi @Zouhj

At the low level the CNN takes as input a square image of size 64x64 pixels. However, if you pass a larger square Deepgaze will automatically resize it to be 64x64. Be carefull because the library does not accept rectangular images or images which are smaller than 64x64 pixels. The best performance is achieved with a subframe of the face which goes from the forehead to the chin, similarly to the faces in this example.

All the best, Massimiliano

Zouhj commented 7 years ago

Yeah, I get it. Thank you very much!