mkocabas / VIBE

Official implementation of CVPR2020 paper "VIBE: Video Inference for Human Body Pose and Shape Estimation"
https://arxiv.org/abs/1912.05656
Other
2.87k stars 549 forks source link

Training dataset #190

Open NoLookDefense opened 3 years ago

NoLookDefense commented 3 years ago

Hello. Thanks for your excellent work. I wonder whether I can train with only posetrack and 3dpw dataset?

haolyuan commented 3 years ago

I have the same question! I don't understand how to train with only AMASS dataset. Have you solved this issue?

barvin04 commented 3 years ago

"During training, VIBE takes in-the-wild images as input and predicts SMPL body model parameters using a convolutional neural network (CNN) pretrained for single-image body pose and shape estimation [37] followed by a temporal encoder and body parameter regressor used in [30]. Then, a motion discriminator takes predicted poses along with the poses sampled from the AMASS dataset and outputs a real/fake label for each sequence" from the paper

LG = L3D + L2D + LSMPL + Ladv is the loss function

During training you'll require dataset for first, second and third term (description in paper) The fourth term is the adversarial penalisation which uses AMASS dataset. The idea to use AMASS is to have motion which is close to some prior motions. This has both pros and cons.

In all, @NoLookDefense you'll need AMASS dataset for the adversarial section. @Yamato-01 you will require more than just AMASS dataset which is used for adversarial loss only, after the CNN + GRU architecture.

This is what I have understood. The authors please point out if any mistakes.