hlcdyy / pan-motion-retargeting

codes for paper "Pose-aware Attention Network for Flexible Motion Retargeting by Body Part" (TVCG2023)
https://arxiv.org/abs/2306.08006
BSD 2-Clause "Simplified" License
100 stars 6 forks source link

Retargeting from Human3.6M to Mixamo #3

Open tshrjn opened 1 year ago

tshrjn commented 1 year ago

As mentioned in the paper, this retargeting has been tested. Could you share the inference code for this? I see the bvh_parser.py reads skeleton structure for this dataset, but on trying myself, I found the retargetting doesn't work for first half of the motion, and only works for the 2nd half of the motion, approximately.

Perhaps there's split happening somewhere in the script? I've modified the eval_single_pair.py script. Any inference script would be useful like on Human3.6M to Mixamo.

hlcdyy commented 1 year ago

@tshrjn We quantitatively test the retargeting by the script test_mixamo.py. You can run python test_mixamo.py --save_dir ./pretrained_mixamo --model pan --epoch 1000 to evaluate the provided pre-trained model or your own model on the Mixamo test dataset. We do not provide the exact script for Human3.6M to Mixamo retargeting, because Human3.6M is a bit more tedious to preprocess. But you can first convert the raw D3 Angles CDF files of the H36M dataset to common BVH files by H36M-to-BVH and follow the steps provided by bvh_parser.py. When the processed data is obtained you can choose to retrain Mixamo to the Human3.6M encoder/decoder pairs or use a pre-trained model with joint mapping. Of course, training a new encoder/decoder pair will have better results because of the differences in distribution between the different datasets (maybe this is the cause of half the work in your retargeting motions).

tshrjn commented 1 year ago

Couple of Questions:

  1. It seems we just need to train pair of Enc/Dec for 1 type of Skeleton. And this training is dependent on other target skeleton/domain. i.e. so how to sanity check overfitting a small batch, 2ndly how to generalize beyond domains. I know 2nd is a very open ended research question, I'd appreciate your thoughts on the direction.

  2. Regarding the example I mentioned, it seems the first half of transfer isn't working at all, while from 60ms onwards it seems pretty decent, any experience or insights into why this would be happening?

Link

hlcdyy commented 11 months ago

Couple of Questions:

  1. It seems we just need to train pair of Enc/Dec for 1 type of Skeleton. And this training is dependent on other target skeleton/domain. i.e. so how to sanity check overfitting a small batch, 2ndly how to generalize beyond domains. I know 2nd is a very open ended research question, I'd appreciate your thoughts on the direction.
  2. Regarding the example I mentioned, it seems the first half of transfer isn't working at all, while from 60ms onwards it seems pretty decent, any experience or insights into why this would be happening?

    Screen.Recording.2023-08-16.at.5.32.25.AM.mov

For the first question, the proposed method assumes that the distributions of the source and target motion domain are similar during the training process (both in biped-to-biped and biped-to-quadruped scenes). We could test the model for over-fitting by attempting to retarget unseen types of motion. I think generalizing beyond domains is a challenging task that may be equivalent to implementing zero-shot motion retargeting. Articulated motions in the different datasets have their own unique characteristics, for example, the chest joint protrudes forward relative to the spine chain in the AMASS dataset when the human body is in an upright pose (figure left). However, the chest joint in the lafan1 skeleton is essentially co-linear with the spine chain (figure right). Therefore, I think it may be more reasonable to realize a few-shot motion retargeting than zero-shot since if the model has never seen the movements of the target structure, it is not possible to infer their unique characteristics simply by the skeleton's OFFSET. Selecting some representative postures in terms of body parts and constructing the correspondence between source-target skeletons might be the direction to realize a few-shot motion retargeting.

For the second question, I don't have a clue why this happened. structure_differences