mks0601 / V2V-PoseNet_RELEASE

Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
https://arxiv.org/abs/1711.07399
MIT License
377 stars 69 forks source link

Clarity about pixel and world coordinates #42

Closed saileshprem closed 5 years ago

saileshprem commented 5 years ago

Hi, I'm trying to use the pretrained model to estimate pose on my custom images. For center estimation, I used a different quick technique for now to estimate the (x,y) image coordinates of the center of the head (since I'm doing top-view ITOP, I estimate the center of the head), instead of DeepPrior++. While trying to convert that to the world coordinates, which the model requires, I tried using the pixel2world function in the data.lua file for ITOP. But the function takes in (x,y,z) while I have only (x,y) from the image coordinates. Hence I want some clarity on what's the z in pixel coordinates. I assumed pixel coordinates are the same as image's coordinates (row,column), is that correct?

I tried with different values of z and find the resulting joint coordinates to be spaced apart with lower z and grouped closer together with higher z (The estimation still wasn't correct).

Can you give me a brief explanation about pixel coordinates, world coordinates and how to go about my current problem here, concerning them?

Thanks a lot in advance!

mks0601 commented 5 years ago

Sorry for late reply. The pixel coordinates are the same as image coordinates as you said. The Z value is required to convert image coordinates to camera (or world) coordinates. You can search camera calibration stuffs in google.