paTRICK-swk / P-STMO

[ECCV2022] The PyTorch implementation for "P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation"
MIT License
152 stars 11 forks source link

Some clarifications on data preprocessing steps #29

Open ovshake opened 1 year ago

ovshake commented 1 year ago

Hi, Thank you so much for open sourcing such an important work. I had a few queries regarding the preprocessing of the dataset. I understand, the data was used from the Video2Pose repo. Hence the data preprocessing steps are same as those. Also, they have in turn taken from 3d-pose-baseline. In this repo, the data pre processing steps, they have mean and std dev normalized the data (here). Is there any other preprocessing steps to input the 2D pose in the fine tuning phase? I am trying to finetune on a custom dataset, but the results are not up to par.

paTRICK-swk commented 1 year ago

Hi, ovshake Sorry that I haven't gone through the code in 3d-pose-baseline in detail. I used the npz file provided in Video2Pose and adopted their preprocessing code. I'm not quite sure if std and dev are involved in this process. I found the npz file provides the coordinates of the 2D keypoints on the pixel domain. The preprocessing code divides them by the height/width of the image for normalization (here). The results are used as inputs to the network. Maybe you could try this scheme.