ibaiGorordo / pyKinectAzure

Python library to run Kinect Azure DK SDK functions
MIT License
459 stars 113 forks source link

How to convert a 2D co-ordinates to get its corresponding 3D points ? #93

Closed thas1909 closed 1 year ago

thas1909 commented 1 year ago

Hi @ibaiGorordo and the community!

Thanks for making this awesome library!

I have one question. I'm trying to combine object detection in the body tracking application. Since i can get 2D co-ordinates of the colour image from the object detection model, say yolov5 etc, how should i get that point's corresponding real-world 3D coordinates?

I was trying to use the function "convert_2d_to_3d" inside calibration.py. When doing so, it throws an error saying that the source_point2d that I need to provide, should be in a k4a_float2_t instance instead of giving a tuple of (x,y) values.

I can't figure out how to send values in that format. Could someone help me in this? Thanks

nliedmey commented 1 year ago

Hey @thas1909!

I'm currently working on a comparable project as you are doing. I extract 2D Landmarks on the RGB-image and then try to get the corresponding depth value of each landmark.

One thing that helped me at the beginning to understand the four different coordinate systems that can be used when working with the Azure Kinect: https://learn.microsoft.com/en-gb/azure/kinect-dk/coordinate-systems

Based on this, understanding what exactly is needed to transform coordinates from one coordinate system to another helped me a lot: https://learn.microsoft.com/en-gb/azure/kinect-dk/use-calibration-functions

So in your case, if I understand it correctly, you start in the 2D RGB-System with your XY-Coordinate of a Landmark and want to transform this into the 3D RGB-System?

To do this, you need the depth of this coordinate. As mentioned in the answer here https://github.com/etiennedub/pyk4a/issues/188#issuecomment-1369876177 , you can use the SDK "depth_image_to_color" function to transform the depth image of the record into the same size as the RGB image. This procedure is also explained under https://learn.microsoft.com/en-gb/azure/kinect-dk/use-calibration-functions. What this function does in the background is bringing the depth coordinate from its 2D Depth coordinate system into 3D Depth coordinate system and then into the 2D RGB coordinate system. All of these transformations take the calibration parameters of both cameras, stored in your capture object, into account automatically (if I understand it right).

As a results, you can take the transformed depth image and look up the value of the XY-Landmark coordinates. An examplary code on a pyk4a replay would is:

yourPlayback = PyK4APlayback("abc.mkv")
yourPlayback.open()
calibration = yourPlayback.calibration
while True:
     try:
         capture = yourPlayback.get_next_capture()
         if capture.depth is not None:
             transformed = capture.transformed
             coord_depth = transformed[int(Y)][int(X)]
    except:
        ...

This gives you the depth of the coordinate in mm. Keep in minde, that due to the different positions of the depth and RGB camera, there are missing depth values in the transformed image.

You can then use this value to transform your coordinate from the 2D RGB system into the 3D RGB system by using:

try:
    coord_3d = calibration.convert_2d_to_3d(coordinates=(Y,X), depth = coord_depth, source_camera=pyk4a.CalibrationType.Color)
except ValueError:
    ....

This function throws errors when taking invalid depth values and you need to handle them.

Hope I could help you out a bit. I'm now quiet sure if this is the best way to solve the problem, but one that could work :)

JJLimmm commented 1 year ago

Hey @thas1909!

I'm currently working on a comparable project as you are doing. I extract 2D Landmarks on the RGB-image and then try to get the corresponding depth value of each landmark.

One thing that helped me at the beginning to understand the four different coordinate systems that can be used when working with the Azure Kinect: https://learn.microsoft.com/en-gb/azure/kinect-dk/coordinate-systems

Based on this, understanding what exactly is needed to transform coordinates from one coordinate system to another helped me a lot: https://learn.microsoft.com/en-gb/azure/kinect-dk/use-calibration-functions

So in your case, if I understand it correctly, you start in the 2D RGB-System with your XY-Coordinate of a Landmark and want to transform this into the 3D RGB-System?

To do this, you need the depth of this coordinate. As mentioned in the answer here etiennedub/pyk4a#188 (comment) , you can use the SDK "depth_image_to_color" function to transform the depth image of the record into the same size as the RGB image. This procedure is also explained under https://learn.microsoft.com/en-gb/azure/kinect-dk/use-calibration-functions. What this function does in the background is bringing the depth coordinate from its 2D Depth coordinate system into 3D Depth coordinate system and then into the 2D RGB coordinate system. All of these transformations take the calibration parameters of both cameras, stored in your capture object, into account automatically (if I understand it right).

As a results, you can take the transformed depth image and look up the value of the XY-Landmark coordinates. An examplary code on a pyk4a replay would is:

yourPlayback = PyK4APlayback("abc.mkv")
yourPlayback.open()
calibration = yourPlayback.calibration
while True:
     try:
         capture = yourPlayback.get_next_capture()
         if capture.depth is not None:
             transformed = capture.transformed
             coord_depth = transformed[int(Y)][int(X)]
    except:
        ...

This gives you the depth of the coordinate in mm. Keep in minde, that due to the different positions of the depth and RGB camera, there are missing depth values in the transformed image.

You can then use this value to transform your coordinate from the 2D RGB system into the 3D RGB system by using:

try:
    coord_3d = calibration.convert_2d_to_3d(coordinates=(Y,X), depth = coord_depth, source_camera=pyk4a.CalibrationType.Color)
except ValueError:
    ....

This function throws errors when taking invalid depth values and you need to handle them.

Hope I could help you out a bit. I'm now quiet sure if this is the best way to solve the problem, but one that could work :)

Hi @nliedmey , thanks for your help but i have a question for you. when you were calling convert_2d_to_3d() and the input argument for source_camera=pyk4a.CalibrationType.Color . I believe that is called when using the other pyk4a python wrapper repository, and not for this repository (pykinect azure). Are you able to show how you do it for this repository?

Thanks

ibaiGorordo commented 1 year ago

Sorry for the delay, I have just added an example to do what you are asking: https://github.com/ibaiGorordo/pyKinectAzure/blob/master/examples/exampleTransformPoint2DTo3D.py

Another option would be to get the 3D point cloud, and transform it into the color image (haven't implemented the transformation), and then just index in the transformed 3D point cloud with the 2D boxes coordinates.

JJLimmm commented 1 year ago

Hi @ibaiGorordo , thank you so much for showing how to call this API correctly. i have a question based on your second option regarding transforming the 3D point cloud into color image coordinate space. For the Body Tracking Joints, is that being inferred on the 3D Depth image? and is this 3D depth image same as the Point Cloud? i am slightly confused from the online Azure documentation.

Thank you!

thas97 commented 1 year ago

Thankyou so much! This helps a lot!

On Thu, Feb 9, 2023 at 16:40 Ibai Gorordo @.***> wrote:

Sorry for the delay, I have just added an example to do what you are asking: https://github.com/ibaiGorordo/pyKinectAzure/blob/master/examples/exampleTransformPoint2DTo3D.py

Another option would be to get the 3D point cloud, and transform it into the color image (haven't implemented the transformation), and then just index in the transformed 3D point cloud with the 2D boxes coordinates.

— Reply to this email directly, view it on GitHub https://github.com/ibaiGorordo/pyKinectAzure/issues/93#issuecomment-1423760861, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANW4QB66R74OYZTCFHZZ74TWWSNOFANCNFSM6AAAAAAUP5CCLM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ibaiGorordo commented 1 year ago

For the second solution, I have added the option to get the transformed point cloud (RGB), so you can do something like this in the colorPointCloud example to get the the 3D point from pixel coordinates of the RGB camera:

points_map = points.reshape((color_image.shape[0], color_image.shape[1], 3))
pixelx = 100
pixely = 100
print(points_map[pixelx, pixely, :])
floppy-49 commented 1 year ago

I have an additional question for converting a point. Wouldn't it be an option to get the depth_image and the colored_depth_image to obtain the depth value? With that, you can pick a Pixel in the colored depth image and getting the depth value from the depth image like z_value = depth_image[y, x]? For this, the usage of convert_2d_to_3d could be omitted?