postech-ami / FastMETRO

[ECCV'22] Official PyTorch Implementation of "Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers"
https://fastmetro.github.io/
MIT License
164 stars 14 forks source link

Tensor size not matched during training #25

Closed cjwku1209 closed 11 months ago

cjwku1209 commented 11 months ago

After running the Training on Human3.6M steps in Experiments.md.

python -m torch.distributed.launch --nproc_per_node=1 \
       src/tools/run_fastmetro_bodymesh.py \
       --train_yaml Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml \
       --val_yaml human3.6m/valid.protocol2.yaml \
       --arch hrnet-w64 \
       --model_name FastMETRO-L \
       --num_workers 1 \
       --per_gpu_train_batch_size 16 \
       --per_gpu_eval_batch_size 16 \
       --lr 1e-4 \
       --num_train_epochs 60 \
       --output_dir FastMETRO-L-H64_h36m/

I had encounter the following error

File "/hdd/input_pruning_exp/HMR_transformer/FastMETRO/src/modeling/_smpl.py", line 99, in forward
    v_posed = v_shaped + torch.matmul(posedirs, lrotmin[:, :, None]).view(-1, 6890, 3)
RuntimeError: The size of tensor a (480) must match the size of tensor b (16) at non-singleton dimension 0
Killing subprocess 5813

It seems like the v_shaped dimension does not match posedirs and lrotmin with a batch size 16. I had printed out the tensor size for v_shaped, posedirs & lrotmin for reference

v_shaped.shape = torch.Size([480, 6890, 3])
lrotmin[:, :, None].shape = torch.Size([16, 207, 1])
posedirs.shape = torch.Size([16, 20670, 207])
cjwku1209 commented 11 months ago

The files on SMPLify had been updated to version 1.1.0. The problem had been solved by finding and replacing the basic model back to 1.0.0 version.