Clarity about pixel and world coordinates

Hi, I'm trying to use the pretrained model to estimate pose on my custom images. For center estimation, I used a different quick technique for now to estimate the (x,y) image coordinates of the center of the head (since I'm doing top-view ITOP, I estimate the center of the head), instead of DeepPrior++. While trying to convert that to the world coordinates, which the model requires, I tried using the pixel2world function in the data.lua file for ITOP. But the function takes in (x,y,z) while I have only (x,y) from the image coordinates. Hence I want some clarity on what's the z in pixel coordinates. I assumed pixel coordinates are the same as image's coordinates (row,column), is that correct?

I tried with different values of z and find the resulting joint coordinates to be spaced apart with lower z and grouped closer together with higher z (The estimation still wasn't correct).

Can you give me a brief explanation about pixel coordinates, world coordinates and how to go about my current problem here, concerning them?

Thanks a lot in advance!

mks0601 / V2V-PoseNet_RELEASE

Clarity about pixel and world coordinates #42