ishay2b / VanillaCNN

Implementation of the Vanilla CNN described in the paper: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project. http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/ Written by Ishay Tubi : ishay2b [at] gmail [dot] com https://www.l
188 stars 80 forks source link

68 landmark process #4

Open fishman2008 opened 8 years ago

fishman2008 commented 8 years ago

Is there a convenient way to process the landmark prediction for 68 points? I looked at your code and it seems that the 5 landmarks are represented by lefteye, righteye, left mouth,right mouth and middle. Since I am not vey familiar with python, it seems a bit hard to construct a loop structure with these naming conventions.

ishay2b commented 8 years ago

caffe actually returns a numpy array of 10 floats, i used a structure naming for convince but this is not a must as your network should return 136 floats without any need for naming.

Notice for example mainLoop.py: testErrorMini, i just rescale the numpy array as is.


    for i, dataRow in enumerate(dataRowsTrainValid):
        dataRow40 = dataRow.copyCroppedByBBox(dataRow.fbbox).copyMirrored()
        image, lm_0_5 = predictor.preprocess(dataRow40.image, dataRow40.landmarks())
        prediction = predictor.predict(image) # This is a numpy array
        dataRow40.prediction = (prediction+0.5)*40.  # Scale -0.5..+0.5 to 0..40
fishman2008 commented 8 years ago

Ishay, thanks for your quick response. For the test process, there should be no problem. But for training 68 landmark, there is a class "DataRow" definition which defines lefteye, righteye, leftmouth, right mouth and middle. These definitions correspond to the 5 landmarks. If I have 68 landmarks, what would be a convenient way to process this? I saw another implementation where the author defined 15 names for 30 key points process. I am sure there is a bette way?

ishay2b commented 8 years ago

I suggest you throw away all naming structure and replace it with numpy array operators. There is no need for the naming. I will be happy to accept this PR since this is the right way to go and to scale this project. The issue is you will have to handle the indexes for mapping things out.

ishay2b commented 8 years ago

Actually I answered only to the technical programming issue you rose, but Tal Hassner enlightened me that the real problem is that vanilla CNN does not perform well with 68 points as there is a lot less training data publicly available for this task (images with 68 point labels). So instead use the predictions to initialize the CLNF detector for which code is available. You can see the paper for more details: http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/

fishman2008 commented 8 years ago

Indeed, there will be a problem with 68-landmark face alignment if the required dataset is not sufficient. As written in the paper, this is also a first-stage landmark detection, do you have any future plans to release the code for later stage?

angeladzl commented 5 years ago

@fishman2008 Have you finished 68 lamdmarks process?