mingyuan-zhang / MotionDiffuse

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
835 stars 74 forks source link

Evaluation on My Dataset: How to Get the 251-dimensional Motion Vectors? #19

Open JeremyCJM opened 1 year ago

JeremyCJM commented 1 year ago

Hi Mingyuan,

Do you know how to get the 251-dimensional motion vectors as provided in the KiT dataset?

I am computing the FID on my dataset, but our data only has two channels (x, y) instead of 251. Therefore, I wonder how to map the low-dimensional motion sequence to 251-dimensional motion vectors.

Thanks, Jeremy

mingyuan-zhang commented 1 year ago

Hi, you can find the defination of each dimension from here.

However, I think it's hard to directly evaluate on the 2D data with the pre-trained evalutor models on KIT-ML. The positions of each joint are greatly different between 2D data and 3D data. I think you may need to re-train the evaluators.

JeremyCJM commented 1 year ago

Thanks for the reply! If I have 3D joints data, how to map it into 251 dimensions? Do you have the code to do this?

Also, if I want to retrain the evaluation network, which dataset and what task should I choose?

mingyuan-zhang commented 1 year ago

We follow the data preparation as HumanML3D. You can find the data processing in raw_pose_processing.ipynb and motion_representation.ipynb

To retrain evaluation network, the most appropriate way is to train on the same motion dataset as your generative model. You may split the whole motion data into a training split and a validation split. Then you can train a contrastive model (contains a motion encoder and a text encoder) for evaluation. Specifically, given several pairs of ( $\mathrm{text}_i$, $\mathrm{motion}_i$). You can build up a InfoNCE loss to increase the similarity between the extracted feature $\mathrm{text}_i$ and $\mathrm{motion}_i$, and decrease the similarity between the extracted feature $\mathrm{text}_i$ and $\mathrm{motion}_j (i \neq j)$

JeremyCJM commented 1 year ago

Thanks! It sounds like a clip on text and motion.