rohitgirdhar / CATER

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
https://rohitgirdhar.github.io/CATER/
Apache License 2.0
103 stars 19 forks source link

3d_coords to pixel_coords transformation #9

Closed abhaygargab closed 4 years ago

abhaygargab commented 4 years ago

Hello Authors, For our work we require the pixel coordinates of all the objects. We could read the world (3d_coords) of the objects from the json files. Is there a way to convert that directly to pixel_coords??

We tried to compute the homographic transformation matrix using corresponding initial 3d_coords and pixel_coords of objects available in the json files, which doesn't produce accurate transformation.

function "get_camera_coords" in utils.py seems to do the job but we donot have access to the parameters to run the function.

Thank You

rohitgirdhar commented 4 years ago

Thanks for your interest. Unfortunately that is not trivial to my understanding. We are able to do it for the first frame for static camera setup, when initializing the tracker baseline (we manually computed the homography between the ground plane and camera plane, and used it to transform the ground/world 3d coordinates to camera plane 2d coordinates), but it only applies when the object is on the ground plane, and not floating in the air. Moreover, moving camera would further complicate things.

I think the most robust way to solve the problem would be to re-render the data and store some sort of segmentation maps for objects, which should help with accurate localization of each object in the image plane.