paulchhuang / rich_toolkit

38 stars 2 forks source link

Meaning of trans in dataset #4

Open yufu-liu opened 11 months ago

yufu-liu commented 11 months ago

Hi, thanks for sharing this amazing dataset!

I can successfully visualize every motion by SMPL parameters. However, I found the mesh I plotted was upside down. After rotating it by 180 degree, It looks normal now, but I still have one question: In general, the translation on y-axis should be the height of a person, so is it possible for me to get the correct height on y-aixs? (The current numbers look a little bit larger than normal ones.)

paulchhuang commented 10 months ago

Hi, can you provide a minimum code snippet to illustrate your problem? For instance, by "height" it can mean the height of the body, e.g., 180cm, 175cm; it may also mean "elevation", i.e. how far the body is away from the ground. Also, relying solely on y-axis for the above metrics assumes scenes are roughly axis-aligned. Is this true in your use case? Happy to help with your problem but I need more context.

yufu-liu commented 10 months ago

Hi, thanlks for your friendly response! I just loaded the data like the script below.

import os
import numpy as np
data_path = 'file path of RICH dataset in your computer'
data = np.load(os.path.join(data_path), allow_pickle=True)
print(data["trans"])

After printing the translation, I found the translation is a little bit large, so I wonder what is the meaning of the translation. For example, the translation in AMASS dataset is the height of root joint from the ground in meters, like 1.2m when the body stands. BTW, I also found there is a script for calibration, multicam2world.py, and wasn't sure if it relates to this question.

paulchhuang commented 10 months ago

Hi, Thanks for the explanation. The global_orient and transl/trans in the SMPL params are always w.r.t. to a certain world coordinate frame. If this global coord. happens to be on the ground plane, then trans might roughly reflect the height of the root joint as you observed in AMASS. This is indeed the convention in many indoor mocap. datasets. In RICH, however, we simply follow the convention in camera calibration files -- using camera 0 as the world coordinate, so translation is also w.r.t. camera 0.

If your use case requires a reference coordinate that's more "axis aligned", i.e., ground plane is parallel to xy (or xz) plane, then you can follow multicam2world.py to transform everything to the original coordinate of scans. Then you may fit a plane to all ground points (which requires some manual annotation I imagine) and the distance of root to the ground plane is gonna be the "height" of root joint.

Don't have good out-of-the-box solutions but I hope these clarifications helped you proceed.

yufu-liu commented 10 months ago

Hi, I really appreciate your detailed explanation! After trying multicam2world.py, I found the translations differ from scenes, like the examples below.

ParkingLot2: (2.78, 1.52, -0.95) Gym: (3.36, 2.32, 0.55) LectureHall_yoga: (-4.75, 2.79, -1.00) LectureHall_chair: (-5.11, 3.00, -0.94) BBQ: (-0.98, -0.07, -0.63) Pavallion: (0.63, -3.77, -0.46) (I just picked a file for each scene and recorded the translation of the first frame.)

According to your explanation, these translations still need to be processed by the manual annotation. May I ask a question about how to do the manual annotation? To my understanding, if the ground plane is parallel to the current coordinate system, I only need to substract or add a constant value to fit the ground plane, right? However, I don't know how to get the values.