training script for CNN headpose estimation

mpatacchiola / deepgaze

Computer Vision library for human-computer interaction. It implements Head Pose and Gaze Direction Estimation Using Convolutional Neural Networks, Skin Detection through Backprojection, Motion Detection and Tracking, Saliency Map.

MIT License

1.8k stars 478 forks source link

training script for CNN headpose estimation #59

Closed Aileenlingyu closed 6 years ago

Aileenlingyu commented 6 years ago

Hi @mpatacchiola , thanks for the great work! I was wondering that, would you in future share the complete training script for the model currently used in CNN headpose estimation? Thanks!

mpatacchiola commented 6 years ago

Hi @Aileenlingyu

Thank you for your interest in the library! The model used in Deepgaze is the architecture B described in the original article and reported in Figure 3. This architecture can be easily implemented with the recent tensorflow layers interface.

At the moment the training script is not implemented and we do not plan to add it in the next future. However, if you need the code for a scientific publication in order to compare to other methods or to extend it, then I can provide a copy of our files. If this is the case please send me an email using my personal page or my university contact shown in the first page of the article.

speculaas commented 6 years ago

Dear mpatacchiola , thanks for the great work!

I have some further question:

By "training script not implemented", do you mean it's not implemented in Tensorflow?

Or you mean the training script is in Tensorflow, but maybe not ready to be shared here?

And can you share more details about training?

For example, maybe you trained the head pose CNN in another language, and export it to tensorflow format?

BR, Jimmy

mpatacchiola commented 6 years ago

Hi @speculaas

The training script is in Tensorflow but before sharing the file it is necessary to add some comments and remove unnecessary parts. Moreover, the code has been written in a very old version of tensorflow and it probably requires some debug to make it suitable for the latest versions.

Unfortunately in this moment I am quite busy and I cannot do it. However, the structure of the network is already implemented in deepgaze and the code can be easily modified to add a training method.

Give a look to this file and in particular to the class CnnHeadPoseEstimator. There is also an example on how to train an MLP on the same dataset here. You can easily replace the MLP model with a CNN using the recent tf.layers utilities.

speculaas commented 6 years ago

Dear Thanks for taking time to help me in such short time!

I was going through the issues and came across this: https://github.com/mpatacchiola/deepgaze/issues/31

I guess you already explained things. Sorry I didn't read through the issues.

And more one thing to make sure,

I ran ex_cnn_head_pose_estimation_images.py against the prima dataset.

And I found the predicted pitch often differs from ground truth by a maybe big margin, for example :

person02205-60-30.jpg deepgaze, pitch: -9.227618, deepgaze, yaw: 8.669536

And this leads to my question:

Does ex_cnn_head_pose_estimation_images output degree? or radian?

Or it outputs degree, but not accurate because you said you focus on AFW and AFLW?

Or it's expected because you design the CNN to output classification result in increments of 15?

BR, Jimmy