postech-ami / FastMETRO

[ECCV'22] Official PyTorch Implementation of "Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers"
https://fastmetro.github.io/
MIT License
164 stars 14 forks source link

Remember to use mixed dataset when reproducing H3.6M results #27

Closed Dariushuangg closed 11 months ago

Dariushuangg commented 11 months ago

Hi, I trained FastMETRO-L-H64 on H3.6M but only got this performance: INFO:FastMETRO:Best Results: (PA-MPJPE) \ 0.00 \ 75.36 \ 47.05 at Epoch 60.00 I tried evaluating the official checkpoint and got the same performance as published: INFO:FastMETRO:Validation Epoch: 0 MPVPE: 0.00, MPJPE: 52.95, PA-MPJPE: 33.58

I didn't alter any hyperparameters, except that I am using 8 V100 GPUs: python3.8 -m torch.distributed.launch --nproc_per_node=8 --master_port=29502\ src/tools/run_fastmetro_bodymesh.py \ --arch hrnet-w64 \ --model_name FastMETRO-L \ --num_workers 4 \ --per_gpu_train_batch_size 16 \ --per_gpu_eval_batch_size 16 \ --lr 1e-4 \ --num_train_epochs 60 \ --output_dir FastMETRO-L-H64_h36m/

I did modified _run_fastmetrobodymesh.py by deleting all mesh visualization code. I am using backbone hrnetv2_w64_imagenet_pretrained.pth

Any clue on what could I be doing wrong? Thx

FastMETRO commented 11 months ago

Hello,

(1) Using different number of GPUs during the model training might lead to different performance. (2) Please check whether all training datasets are correctly downloaded. (3) Please check whether all training settings are the same.

Note that a substantial number of noisy pseudo annotations might introduce instability during the model training, as also observed by METRO and MeshGraphormer; one could alleviate this issue with more accurate pseudo annotations provided by NeuralAnnot or EFT. Despite such instability, we never observed such a large performance drop in our experiments. Thus, please ensure the above (1), (2) and (3).

Thanks for your interest in our work!!

FastMETRO commented 11 months ago

Please reopen this issue if you need more help regarding this.

Dariushuangg commented 11 months ago

In experiment.md, training H3.6M requires the input parameter --train_yaml Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml

However, I can't find Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml in the root directory and in the dataset folder, so I am using the train.yaml in the provided H3.6M folder for substitute. Could this be the cause of the mismatch? Could you provide me with the content of Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml? Thanks.

Dariushuangg commented 11 months ago

Hmm, is it correct that in order to reproduce the evaluation results on H3.6M, I will need to train the model on H3.6M + coco_smpl + muco + up3d + mpii?

FastMETRO commented 11 months ago

Yes, as described in Section 5.1 of the paper, you should use the mixed datasets for the model training. It seems that you only used the Human3.6M dataset for the training; it leads to the low performance.

Dariushuangg commented 11 months ago

OK, I downloaded and used all 5 datasets for training, and now the metrics looks correct. I do recommend changing the title in the experiment.md though, as it said training on Human3.6M but in fact uses the config file for mixed-dataset training; This could cause confusion. image

Anyway, thx for your timely reply and I will close this issue.