paper Table 1, Text-to-motion evaluation on the HumanML3D

zhenzhiwang / intercontrol

MIT License

62 stars 4 forks source link

paper Table 1, Text-to-motion evaluation on the HumanML3D #3

Open XueYing126 opened 2 months ago

XueYing126 commented 2 months ago

Thank you for sharing the great work!

From table 1 in your paper "InterControl: Generating Human Motion Interactions by Controlling Every Joint", the evaluation result of is 0.159 for Text-to-motion evaluation on the HumanML3D. I wander how it is much better than original MDM?
as far as I understand, you used a pre-trained MDM. Or did you use the controlnet and use 0s as spatial control? Is there a saved checkpoints so it is replicable?

Thank you!

zhenzhiwang commented 2 months ago

Hi, thanks for your interest. Yes I use the controlnet, and use gt root trajectories as the control signals, similar to the previous paper GMD. The better FID is partly from the input gt root trajectories (which is the same condition with GMD), and partly from the conditional distribution learned from controlnet.

XueYing126 commented 2 weeks ago

Hi, @zhenzhiwang

Can I ask what your exact command was to run the evaluation for HumanML3D+pelvis to get an FID of 0.159 in Tab. 1 ? I am trying to replicate the result.

Thanks a lot!

XueYing126 commented 1 week ago

Another question about the training: Since assert use_posterior == True at: https://github.com/zhenzhiwang/intercontrol/blob/f32ea5dceba6a7b8fcea22fbec4314d02329be25/diffusion/control_diffusion.py#L414

For training "loss guidance on x0" and "Only for pelvis control", should --use_posterior be added to all the training command?
num_condition = k_first = 1 by default and is never changed, so for training, IK is applied once for all, is that correct?

Thanks a lot!