hongsukchoi / 3DCrowdNet_RELEASE

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022
MIT License
155 stars 15 forks source link

Cannot reproduce without pre-trained ResNet-50 weights of xiao2018simple #16

Open mimiliaogo opened 2 years ago

mimiliaogo commented 2 years ago

Hi, I tried to reproduce table 8 without pre-trained ResNet-50 weights of xiao2018simple. My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml and the config file is :

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.00025 #0.001/4
lr_backbone: 0.0001
lr_dec_factor: 10

However, I got very strange results on 3dpw as below (I evaluate every epoch):


Do you have any idea about this? Thank you!

hongsukchoi commented 2 years ago


if you are not using the pretrained backbone, please set ‘lr’ and ‘lr_backbone’ the same

mimiliaogo commented 1 year ago

I changed my config file as below:

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.0005 
lr_backbone: 0.0005
lr_dec_factor: 10

# modify batch size
train_batch_size: 128
test_batch_size: 128

However, the results were still weird.

hongsukchoi commented 1 year ago


Yes, the results seem weird.

  1. Are you evaluating on 3DPW-Crowd?

  2. How can you train that fast? I don’t remember exactly, but it took about more than 12hours to train for 6epochs. You are training for 40epochs with half batch size. 2days are not enough.

mimiliaogo commented 1 year ago
  1. I evaluate on 3DPW. not Crowd.
  2. I used RTX3090 with batch size 128. The training time is 0.91h / epoch.
hongsukchoi commented 1 year ago

Wow, I didn't know that RTX 3090 is that better than RTX 2080 ti.

I thought you were testing on 3DPW-Crowd, since you are using 3dpw_crowd.yml

My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml

Can you share your full code via github repo? Some information is confusing. Increasing errors seem really weird.

mimiliaogo commented 1 year ago

So sorry that I pasted the wrong command. My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw.yml This is my full code: https://github.com/mimiliaogo/3DCrowdNet-Mimi Thank you so much!

hongsukchoi commented 1 year ago

Thanks for sharing the code. I can't find a critical bug...

Here are a few suggestions.

  1. Could you try testing with the test.py? Due to the evaluation per epoch, there could be unintentional overwriting in the testing data during the process.

  2. Could you visualize the training data? Visualize GT joints and meshes on the image. There could be corruption during downloading. And is there any change in MPII.py code?

  3. Could you train with this config info and see the result? It shouldn't take long. It's to see which dataset is causing the increasing error.

    trainset_3d: []
    trainset_2d: ['MSCOCO']
    testset: 'PW3D'

lr_dec_epoch: [30] end_epoch: 40 lr: 0.001 lr_backbone: 0.001 lr_dec_factor: 10

modify batch size

train_batch_size: 128 test_batch_size: 128

mimiliaogo commented 1 year ago

Hi, I tried your conifg as 3., the results seem normal.


So maybe the problem is from training data. I will try to visualize them. BTW, there is no change in MPII.py code.

mimiliaogo commented 1 year ago

@hongsukchoi, when I train your model with Human3.6M and MuCo respectively, both of them will have increasing errors. I visualize the GT keypoints and joints, and the results seem normal (maybe a little inaccurate, but mostly right). However, I still don't know why these two datasets will lead to increasing errors... image image