Open hnuzhy opened 5 years ago
Hi hnuzhy,
Thanks for your interest in our work. Please excuse my late reply, but I have not been monitoring this account. Here are some answers:
Regarding 1 (pre-processing): The preprocessing you describe seems correct. As a sanity check, I would recommend to draw the facial landmarks after applying procrustes and see how they align. Ideally, the facial landmarks would look similar as in Fig.5 of the paper and different samples should align well with each other.
Regarding 2 (source code): The original implementation used to create the results in the paper uses the Matlab code here: https://github.com/kaltwang/2015latent. This Python repository is a newer version with various changes, however it should still produce reasonably similar results.
Indeed 'exp' is the correct LT prediction mode, as is described in the paper "For prediction, we use the expected value of the FAU intensity...".
Here are some more details that should help to reproduce the results:
The used train/test split for the cross-validation according DISFA subject ID was: {[1;2;3],[4;5;6],[7;8;9],[10;11;12],[13;16;17],[18;21;23],[24;25;26],[27;28;29],[30;31;32]}; One of the sets in brackets was used for testing and all other ones for training.
The used model parameters are: inference.extract_samples_mode = 'exp' structure_update.k_default = 10 structure_update.lklhd_mindiff = 0.001 (note this is different from the default) structure_update.lklhd_mindiff_siblings = -0.1
The training data comprises 2000 randomly selected samples from all training subjects that have at least one active AU (i.e. we remove all frames that have all AUs at intensity 0).
Hope that helps to reproduce the results, please let me know if you have further questions. I do still have the original models and training data as Matlab files and you could use those directly to reproduce the results. You can reach me at s.(surname of the first author)@gmail.com
Hello, the author. the mail I sent you following the address in your paper(sk2608@imperial.ac.uk) is always rejected, so I can only come here to ask you for advice.
Firstly, I'm sorry to distrub you. Recently, I'm trying to reproduce your research work about FAU intensity estimating publised in CVPR2015. Because our lab is preparing a public dataset of Facial Action Units, and we would like to apply your algorithm to verify the quality of the dataset. Although you have provided the source code of the paper on GitHub(https://github.com/kaltwang/latenttrees), I still encountered some problems in the process of testing.
1. Did I pre-process the dataset correctly? I chose to test the python codes in DISFA in the beginning and did perprocessing following the procedure writing in your paper: "the 66 tracked facial landmarks are aligned by Procrustes analysis to the mean shape, which removes translations and in-plane rotations of the face. Then each of the x and y landmark coordinates are normalized by subtracting the mean and dividing by the standard deviation. The coordinates are stacked together into the final 132 dimensional feature vector." After finishing this step, each expression becomes a 132 dimensional float feature vector. And I chose to keep 6 decimal places of x and y landmark coordinates. Here is a sample example. The last another 12 dimensions are AU labels.
"-2.096146,0.808799,-2.091103,0.355009,-2.052868,-0.095662,-1.998895,-0.543859,-1.753963,-0.945115,-1.415642,-1.304292,-1.014566,-1.616506,-0.545876,-1.863525,-0.003086,-1.926723,0.484570,-1.821902,0.853334,-1.538305,1.140700,-1.197997,1.388083,-0.836745,1.543340,-0.444629,1.616310,-0.036416,1.671033,0.373177,1.692315,0.783508,-1.622708,1.228520,-1.392810,1.400170,-1.092824,1.475687,-0.778639,1.473454,-0.470668,1.421783,0.694077,1.358226,0.927759,1.436649,1.179268,1.455757,1.421860,1.399151,1.579266,1.246041,0.142636,1.048108,0.177850,0.732711,0.194456,0.417533,0.205524,0.105226,-0.340913,-0.187562,-0.129228,-0.222087,0.081818,-0.252325,0.272997,-0.229343,0.463566,-0.208629,-1.225450,0.934863,-0.959702,1.035206,-0.668784,1.041919,-0.416015,0.933793,-0.682395,0.860632,-0.963810,0.854780,0.492024,0.902129,0.752871,1.036561,1.062384,1.015008,1.330159,0.884958,1.048860,0.816742,0.756709,0.807697,-0.811857,-0.665949,-0.504451,-0.638740,-0.201791,-0.604549,0.094303,-0.655645,0.310625,-0.600597,0.535826,-0.641831,0.759322,-0.689847,0.610290,-0.913934,0.344564,-1.053538,0.035903,-1.102492,-0.316663,-1.060543,-0.614698,-0.902106,-0.365695,-0.770197,0.088891,-0.840419,0.430812,-0.779141,0.430517,-0.800078,0.087419,-0.867932,-0.370997,-0.784638,0,0,0,0,0,0,2,0,0,0,0,0"
I'm not sure whether I pre-process the DISFA correctly. Or do I have anything to notice?2. Did I use the source code correctly? In the demo code "example_latenttrees.py", I modified K as below:(M = 132+12)
As the 9 folds cross validation mentioned in the paper, I used the last 3 people's FAU as Test Dataset. The former 24 people were allocated into Train Dataset. And the prediction part is written in this way:(M = 132+12)
All other parameters remained unchanged. Then I began to train the 1st fold of DISFA on my computer(i7 CPU, 8G memory), and it took about 22 hours to complete the training. Unfortunately, the final correlation coefficient index(CORR) is abnormal.
As shown above, only AU6,12,25 are predicted. Other AUs have not been detected. I followed up the training of other folds, the results are all abnormal. It seems I am in trouble. After consideration, I deleted the sample with no AU labels in the training data and disrupted the order of all left samples. Correspondingly, the training time has been greatly reduced, but the effect has not improved much.
Detection of AU12 is improved, and AU26 appeared. I was in confusion again! The third time, I tried to change the training model parameters. As shown below, I adjusted 'max' mode into 'exp'.
model.inference.extract_samples_mode = 'exp' # change 'max' into 'exp'
Surprisingly, the prediction effect has improved a lot, but it is still quite different from the description in the paper.Detection of AU1,2,4,17 finally showed up. But the accuracy is rather poor.
Until now, I still could not find out where the problem is. I have to turn to the author for help, and I hope you can make some comments and suggestions. I am looking forward to your reply.