ishay2b / VanillaCNN

Implementation of the Vanilla CNN described in the paper: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project. http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/ Written by Ishay Tubi : ishay2b [at] gmail [dot] com https://www.l
188 stars 80 forks source link

Why normalized-MSE? not Euclidean distance #11

Closed gicheonkang closed 6 years ago

gicheonkang commented 7 years ago

Hi I have a question about evaluating a error rate. while profiling, I saw a code like below (in 'testAFW_TestSet')

for i, dataRow in enumerate(dataRowsAFW_Valid):
        dataRow40 = dataRow.copyCroppedByBBox(dataRow.fbbox)
        image, lm_0_5 = predictor.preprocess(dataRow40.image, dataRow40.landmarks())
        prediction = predictor.predict(image)

        # Using normalized-MSE
        testErrorAFW.add(lm_0_5, prediction)
        dataRow40.prediction = (prediction+0.5)*40

you use mse_normlized with ground-truth, prediction Is there any particular reason why you use MSE ?

ishay2b commented 7 years ago

Same as been used in the article, the error is lessen when the eyes are closer.

gicheonkang commented 7 years ago

Thank you for your reply. I have one more question. In paper, there is no reason why vanillaCNN model uses absolute hyperbolic tangent(activation function).

I tested ReLU, but the accuracy is worse. Can you explain the reason?

ishay2b commented 6 years ago

@gicheonkang Most issues in this database is due to very little data. ReLU has more dynamic range and gives better results where data is available. Here ,my guess is that the Abs Tangent little dynamic range (less degrees of freedom) prevents over-fitting.