ishay2b / VanillaCNN

Implementation of the Vanilla CNN described in the paper: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project. http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/ Written by Ishay Tubi : ishay2b [at] gmail [dot] com https://www.l
188 stars 80 forks source link

learning rate #8

Open laoreja opened 8 years ago

laoreja commented 8 years ago

Thanks for your brilliant code!

My question is that, I noticed that you use 1e-5 as your base learning rate and run lots of epochs and get really good results. I am confused by this setting, and I read the ADAM paper, in the paper, the learning rate they suggest is 0.001 (in Algorithm 1).

So why do you use this learning rating and epoch setting? Are there any special reasons?

Besides, you minus the normalized landmarks by 0.5, will this subtraction give a better result? What is the mechanism behind it?

I am also working on training human facial landmark detection models based on Caffenet, but my result is bad -- the network tend to predict the same position for all the input, which do get a low loss. I guess there is something wrong with my settings. I am looking forward to your reply.

ishay2b commented 8 years ago

Hi @laoreja Regarding the learning rate and epochs, this combination works for me, but other combinations should work as well.

Dividing the landmarks by 40 and subtracting 0.5 normlizes the values to [-0.5,..,+0.5] which saturates less with this activation function (abs tang hyperbolic).

Regarding predicting the 'same' values for all inputs, sounds like you have fallen into some kind of landmarks average. I noticed it myself for VanillaCNN when i did not normlize the error by interocular distance. Also, make sure to normlize your input to avoid saturation.

Hope this helps, Ishay

laoreja commented 8 years ago

Thanks a lot!

One more question is that your batch size (30) is really small.

Do you choose it deliberately to let the network updates its weights more often? Namely, will using such a small batch size help the network to learn better?

Or do you choose it for memory, speed consideration?

ishay2b commented 8 years ago

This parameters (batch size, learning rate...) were not tested too much, they just worked and i saw no reason to change them. As before, other parameters should work as well, feel free to report other working settings.

Regards, Ishay.