kaltwang / latenttrees

Latent trees code (Python version)
BSD 3-Clause "New" or "Revised" License
5 stars 4 forks source link

Some questions about reproducing CVPR2015(LTtrees FAU) #1

Open hnuzhy opened 5 years ago

hnuzhy commented 5 years ago

Hello, the author. the mail I sent you following the address in your paper(sk2608@imperial.ac.uk) is always rejected, so I can only come here to ask you for advice.

Firstly, I'm sorry to distrub you. Recently, I'm trying to reproduce your research work about FAU intensity estimating publised in CVPR2015. Because our lab is preparing a public dataset of Facial Action Units, and we would like to apply your algorithm to verify the quality of the dataset. Although you have provided the source code of the paper on GitHub(https://github.com/kaltwang/latenttrees), I still encountered some problems in the process of testing.

1. Did I pre-process the dataset correctly? I chose to test the python codes in DISFA in the beginning and did perprocessing following the procedure writing in your paper: "the 66 tracked facial landmarks are aligned by Procrustes analysis to the mean shape, which removes translations and in-plane rotations of the face. Then each of the x and y landmark coordinates are normalized by subtracting the mean and dividing by the standard deviation. The coordinates are stacked together into the final 132 dimensional feature vector." After finishing this step, each expression becomes a 132 dimensional float feature vector. And I chose to keep 6 decimal places of x and y landmark coordinates. Here is a sample example. The last another 12 dimensions are AU labels. "-2.096146,0.808799,-2.091103,0.355009,-2.052868,-0.095662,-1.998895,-0.543859,-1.753963,-0.945115,-1.415642,-1.304292,-1.014566,-1.616506,-0.545876,-1.863525,-0.003086,-1.926723,0.484570,-1.821902,0.853334,-1.538305,1.140700,-1.197997,1.388083,-0.836745,1.543340,-0.444629,1.616310,-0.036416,1.671033,0.373177,1.692315,0.783508,-1.622708,1.228520,-1.392810,1.400170,-1.092824,1.475687,-0.778639,1.473454,-0.470668,1.421783,0.694077,1.358226,0.927759,1.436649,1.179268,1.455757,1.421860,1.399151,1.579266,1.246041,0.142636,1.048108,0.177850,0.732711,0.194456,0.417533,0.205524,0.105226,-0.340913,-0.187562,-0.129228,-0.222087,0.081818,-0.252325,0.272997,-0.229343,0.463566,-0.208629,-1.225450,0.934863,-0.959702,1.035206,-0.668784,1.041919,-0.416015,0.933793,-0.682395,0.860632,-0.963810,0.854780,0.492024,0.902129,0.752871,1.036561,1.062384,1.015008,1.330159,0.884958,1.048860,0.816742,0.756709,0.807697,-0.811857,-0.665949,-0.504451,-0.638740,-0.201791,-0.604549,0.094303,-0.655645,0.310625,-0.600597,0.535826,-0.641831,0.759322,-0.689847,0.610290,-0.913934,0.344564,-1.053538,0.035903,-1.102492,-0.316663,-1.060543,-0.614698,-0.902106,-0.365695,-0.770197,0.088891,-0.840419,0.430812,-0.779141,0.430517,-0.800078,0.087419,-0.867932,-0.370997,-0.784638,0,0,0,0,0,0,2,0,0,0,0,0" I'm not sure whether I pre-process the DISFA correctly. Or do I have anything to notice?

2. Did I use the source code correctly? In the demo code "example_latenttrees.py", I modified K as below:(M = 132+12)

  K[0:132] = 1  # dim 0-132: continuous, gaussian distributed
  K[132:M] = 6  # dim 132:M: discrete with 6 levels: 0-5

As the 9 folds cross validation mentioned in the paper, I used the last 3 people's FAU as Test Dataset. The former 24 people were allocated into Train Dataset. And the prediction part is written in this way:(M = 132+12)

  ind_o = list(range(0,132)) # this are the indices of the observed dimensions
  ind_u = list(range(132,M)) # this are the indices of the unoberved dimensions
  # testing gets as input only the observed dimensions (indexed by ind_o)
  X_prediction, lklhd_test = model.testing(X_test[:, ind_o], ind_o, ind_u)

All other parameters remained unchanged. Then I began to train the 1st fold of DISFA on my computer(i7 CPU, 8G memory), and it took about 22 hours to complete the training. Unfortunately, the final correlation coefficient index(CORR) is abnormal.

AU: 1    CoRR: 0         MSE: 0.7717     ICC: -0.0176
AU: 2    CoRR: 0         MSE: 0.2683     ICC: -0.0073
AU: 4    CoRR: 0         MSE: 1.4001     ICC: -0.0317
AU: 5    CoRR: 0         MSE: 0.04       ICC: -0.003
AU: 6    CoRR: 0.5518    MSE: 0.667      ICC: 0.5486
AU: 9    CoRR: 0         MSE: 0.6285     ICC: -0.016
AU: 12   CoRR: 0.4871    MSE: 2.8451     ICC: 0.3301
AU: 15   CoRR: 0         MSE: 0.0674     ICC: -0.0058
AU: 17   CoRR: 0         MSE: 0.059      ICC: -0.0052
AU: 20   CoRR: 0         MSE: 0.0627     ICC: -0.0048
AU: 25   CoRR: 0.7436    MSE: 1.0861     ICC: 0.7294
AU: 26   CoRR: 0         MSE: 1.7913     ICC: -0.1148
Avg:     CoRR: 0.1485    MSE: 0.8073     ICC: 0.1168

As shown above, only AU6,12,25 are predicted. Other AUs have not been detected. I followed up the training of other folds, the results are all abnormal. It seems I am in trouble. After consideration, I deleted the sample with no AU labels in the training data and disrupted the order of all left samples. Correspondingly, the training time has been greatly reduced, but the effect has not improved much.

AU: 1    CoRR: 0         MSE: 1.3167     ICC: -0.0311
AU: 2    CoRR: 0         MSE: 0.4578     ICC: -0.0127
AU: 4    CoRR: 0         MSE: 2.3888     ICC: -0.0579
AU: 5    CoRR: 0         MSE: 0.0683     ICC: -0.0051
AU: 6    CoRR: 0.5831    MSE: 1.4135     ICC: 0.5526
AU: 9    CoRR: 0         MSE: 1.0723     ICC: -0.0282
AU: 12   CoRR: 0.7069    MSE: 1.2406     ICC: 0.6922
AU: 15   CoRR: 0         MSE: 0.115      ICC: -0.01
AU: 17   CoRR: 0         MSE: 0.1007     ICC: -0.009
AU: 20   CoRR: 0         MSE: 0.1071     ICC: -0.0083
AU: 25   CoRR: 0.6257    MSE: 1.6224     ICC: 0.5901
AU: 26   CoRR: 0.0166    MSE: 3.0543     ICC: -0.2575
Avg:     CoRR: 0.161     MSE: 1.0798     ICC: 0.1179

Detection of AU12 is improved, and AU26 appeared. I was in confusion again! The third time, I tried to change the training model parameters. As shown below, I adjusted 'max' mode into 'exp'. model.inference.extract_samples_mode = 'exp' # change 'max' into 'exp' Surprisingly, the prediction effect has improved a lot, but it is still quite different from the description in the paper.

AU: 1    CoRR: -0.0229   MSE: 1.3236     ICC: -0.0335
AU: 2    CoRR: -0.0101   MSE: 0.4638     ICC: -0.014
AU: 4    CoRR: -0.034    MSE: 2.2703     ICC: -0.0283
AU: 5    CoRR: 0         MSE: 0.0683     ICC: -0.0051
AU: 6    CoRR: 0.3571    MSE: 1.0371     ICC: 0.2684
AU: 9    CoRR: 0         MSE: 1.0723     ICC: -0.0282
AU: 12   CoRR: 0.6897    MSE: 0.9181     ICC: 0.5685
AU: 15   CoRR: 0         MSE: 0.115      ICC: -0.01
AU: 17   CoRR: 0.1378    MSE: 0.1013     ICC: 0.076
AU: 20   CoRR: 0         MSE: 0.1071     ICC: -0.0083
AU: 25   CoRR: 0.7158    MSE: 0.7798     ICC: 0.6976
AU: 26   CoRR: 0.2785    MSE: 2.4022     ICC: -0.0215
Avg:     CoRR: 0.176     MSE: 0.8882     ICC: 0.1218

Detection of AU1,2,4,17 finally showed up. But the accuracy is rather poor.

Until now, I still could not find out where the problem is. I have to turn to the author for help, and I hope you can make some comments and suggestions. I am looking forward to your reply.

kaltwang commented 5 years ago

Hi hnuzhy,

Thanks for your interest in our work. Please excuse my late reply, but I have not been monitoring this account. Here are some answers:

Regarding 1 (pre-processing): The preprocessing you describe seems correct. As a sanity check, I would recommend to draw the facial landmarks after applying procrustes and see how they align. Ideally, the facial landmarks would look similar as in Fig.5 of the paper and different samples should align well with each other.

Regarding 2 (source code): The original implementation used to create the results in the paper uses the Matlab code here: https://github.com/kaltwang/2015latent. This Python repository is a newer version with various changes, however it should still produce reasonably similar results.

Indeed 'exp' is the correct LT prediction mode, as is described in the paper "For prediction, we use the expected value of the FAU intensity...".

Here are some more details that should help to reproduce the results:

The used train/test split for the cross-validation according DISFA subject ID was: {[1;2;3],[4;5;6],[7;8;9],[10;11;12],[13;16;17],[18;21;23],[24;25;26],[27;28;29],[30;31;32]}; One of the sets in brackets was used for testing and all other ones for training.

The used model parameters are: inference.extract_samples_mode = 'exp' structure_update.k_default = 10 structure_update.lklhd_mindiff = 0.001 (note this is different from the default) structure_update.lklhd_mindiff_siblings = -0.1

The training data comprises 2000 randomly selected samples from all training subjects that have at least one active AU (i.e. we remove all frames that have all AUs at intensity 0).

Hope that helps to reproduce the results, please let me know if you have further questions. I do still have the original models and training data as Matlab files and you could use those directly to reproduce the results. You can reach me at s.(surname of the first author)@gmail.com