Open JeremyCJM opened 1 year ago
Hi, you can find the defination of each dimension from here.
However, I think it's hard to directly evaluate on the 2D data with the pre-trained evalutor models on KIT-ML. The positions of each joint are greatly different between 2D data and 3D data. I think you may need to re-train the evaluators.
Thanks for the reply! If I have 3D joints data, how to map it into 251 dimensions? Do you have the code to do this?
Also, if I want to retrain the evaluation network, which dataset and what task should I choose?
We follow the data preparation as HumanML3D. You can find the data processing in raw_pose_processing.ipynb and motion_representation.ipynb
To retrain evaluation network, the most appropriate way is to train on the same motion dataset as your generative model. You may split the whole motion data into a training split and a validation split. Then you can train a contrastive model (contains a motion encoder and a text encoder) for evaluation. Specifically, given several pairs of ( $\mathrm{text}_i$, $\mathrm{motion}_i$). You can build up a InfoNCE loss to increase the similarity between the extracted feature $\mathrm{text}_i$ and $\mathrm{motion}_i$, and decrease the similarity between the extracted feature $\mathrm{text}_i$ and $\mathrm{motion}_j (i \neq j)$
Thanks! It sounds like a clip on text and motion.
Hi Mingyuan,
Do you know how to get the 251-dimensional motion vectors as provided in the KiT dataset?
I am computing the FID on my dataset, but our data only has two channels (x, y) instead of 251. Therefore, I wonder how to map the low-dimensional motion sequence to 251-dimensional motion vectors.
Thanks, Jeremy