Closed caiopetruccirosa closed 2 weeks ago
That is strange. I will re-run some of the experiments and get back to you with what I get.
Could you provide training and validation error graphs over the course of the training?
Thank you very much!
Here are the training graphs from WandB that I obtained by running the MORPH benchmark:
I will try to export all data from WandB and put it in a Google Drive folder, so that I can share it with you here.
I have some more pressing things to work on this week, but if all goes well, you can expect me to post my results here next week.
Ok! Thanks a lot. Last Friday I saw that the Google Drive folder I shared here was empty somehow. I put all evaluation reports there now.
The provided configuration files serve as an example of how to use the repository but are not the ones for which we report results in the paper.
I apologize, this should have been caught before. Thank you for noticing this.
Once all the experiments are finished, I will upload the training runs here and provide the correct configuration files.
Ok. Thanks a lot for looking into this!
I am looking forward to running further experiments with these new configuration files. :)
Please, see the attached evaluation of the experiments. The file does not include the trained models, GitHub enforces a limit on attachment size. I will update the repository to include the correct configuration files and add a link to download the weights of a pretrained model on IMDB wiki.
AFAD imagenet:
AgeDB imagenet:
ChaLearn random:
ChaLearn imagenet:
ChaLearn pretrained (resnet50 pretrained on IMDB):
MORPH imagenet:
UTKFace imagenet:
AFAD:
AgeDB:
CLAP2016:
MORPH:
UTKFace:
Thank you very much!
I will try to reproduce all these results now. If anything pops up, I will make a comment here.
Hi @paplhjak,
I successfully reproduced the results for the "ResNet-50 with Cross-Entropy" method on some of the benchmarks. Thank you so much for updating the repository! :)
I'm now working on reproducing the results for the other methods evaluated in the paper. However, I still need the additional config files I requested in issue #21. Could you please provide them? It would be greatly appreciated!
The results I obtained for the UTKFace, MORPH, and CLAP2016 benchmarks, using these datasets as training data, are as follows:
Random | ImageNet | IMDB | |
---|---|---|---|
UTKFace | 5.33(0.16) | 4.77(0.12) | 4.37(0.03) |
MORPH | 7.22(0.45) | 6.74(0.43) | 5.02(0.11) |
CLAP2016 | 7.52(0.26) | 7.80(0.41) | 4.74(0.16) |
AgeDB | 9.31(0.20) | 9.00(0.20) | 6.57(0.08) |
CACD2000 | 8.85(0.26) | 9.47(0.21) | 6.52(0.09) |
AFAD | 6.38(0.27) | 6.89(0.35) | 5.43(0.13) |
FG-NET | 7.71(0.98) | 6.63(0.50) | 4.94(0.19) |
Random | ImageNet | IMDB | |
---|---|---|---|
MORPH | 3.02(0.05) | 2.97(0.06) | 2.81(0.02) |
CLAP2016 | 10.53(0.15) | 9.07(0.34) | 6.89(0.12) |
AgeDB | 12.71(0.15) | 11.89(0.21) | 9.61(0.29) |
CACD2000 | 10.10(0.34) | 11.17(0.40) | 8.60(0.34) |
AFAD | 9.79(0.94) | 7.79(0.69) | 6.64(0.31) |
UTKFace | 12.07(0.29) | 10.85(0.41) | 8.95(0.08) |
FG-NET | 15.47(0.95) | 11.35(0.40) | 9.45(0.38) |
Random | ImageNet | IMDB | |
---|---|---|---|
CLAP2016 | 8.30(0.00) | 6.36(0.00) | 4.51(0.00) |
MORPH | 7.52(0.00) | 6.40(0.00) | 4.99(0.00) |
AgeDB | 12.04(0.00) | 11.02(0.00) | 7.50(0.00) |
CACD2000 | 9.93(0.00) | 8.69(0.00) | 6.80(0.00) |
AFAD | 6.24(0.00) | 7.15(0.00) | 5.89(0.00) |
UTKFace | 8.51(0.00) | 7.50(0.00) | 5.91(0.00) |
FG-NET | 10.18(0.00) | 8.49(0.00) | 5.48(0.00) |
Hi @caiopetruccirosa, I will add them today / tommorow.
Most of them just amount to changing the configuration file to contain 'type' specification for the head.
E.g. :
heads:
- tag: "age"
type: "dldl"
attribute: "age"
...
The supported types are: 'classification', 'dldl', 'dldl_v2', 'unimodal_concentrated', 'soft_labels', 'mean_variance', 'regression', 'megaage', 'orcnn', 'extended_binary_classification', 'coral'.
Please, check out #22
Just checked #22. Thank you very much!
I will try to reproduce the rest of the results and if anything pops up, I will open another issue :)
Hello! First of all, thank you very much for the paper. I found it highly interesting and relevant for the future of this field, especially since there aren't many standardized benchmarks for age estimation today.
I have been trying to reproduce the results reported in Table 6 of the paper, but I can't achieve a similar performance. I ran the 5 benchmarks defined in the
facebase/configs
directory, and my results are as follows:AgeDB_256x256
: 10.74 MAE on AgeDB test splits, when the reported result is 7.20 MAE.CACD2000_256x256
: 8.43 MAE on CACD2000 test splits, when the reported result is 4.59 MAE.CLAP2016_256x256
: 11.08 MAE on CLAP2016 test splits, when the reported result is 5.96 MAE.MORPH_256x256
: 7.04 MAE on MORPH test splits, when the reported result is 2.96 MAE.UTKFace_256x256
: 11.90 MAE on UTKFace test splits, when the reported result is 4.75 MAE.You can find all the evaluation reports for the benchmarks in this Google Drive folder. Also, to ensure reproducibility, I’ve been working with a fork I made.
From my analysis of the train/validation loss curves and test results, it seems that the models, defined in the config, are overfitting across all benchmarks. This led me to think that some hyperparameters, such as the learning rate, might not be correct. I created and ran an alternative benchmark (
facebase/configs/MORPH_256x256_lr1em4
), in which I changed the learning rate from 1e-3 to 1e-4. This adjustment slightly improved the results, with the MAE on the MORPH test split going from 7.04 to 5.87, but it’s still far from the reported value.@paplhjak, could you please provide any insights or suggestions to help resolve this?
Thanks in advance!