Human Pose - Githubissues

JJLimmm commented 1 year ago

Hello @ibaiGorordo ,

Thanks for your work here. I have some questions for you and others from the community regarding using the Body tracking joints keypoints for my own application. It is a little confusing to use the joints information as shown from the official documentation and in the comments from other issues.

I have a few questions regarding using the joints information....

Does anybody know how to use the joints information from body_frame.get_bodies() to draw 2D keypoints on the RGB image?
Why is the 3D keypoint information in millimeters (mm) and how do we use them to say count the distance (any kind of distance be it pixel or actual or other kinds) between like 2 knee keypoints?
are we able to transform the z-coordinate of the 3D joints information to map to the corresponding xy-coordinate in RGB image?
Did anyone try using the joints information to overlay the 3D volumetric human model onto the RGB image like the one from SMPLify-X model?

Cheers

jonas-hurst commented 1 year ago

1) You can use body_frame.draw_bodies(color_image, pykinect.K4A_CALIBRATION_TYPE_COLOR to draw the skeleton into your image. You can see it in an example here. 2) The keypoint informatino is in mm because Microsoft decided it should be mm when writing their SDK ;) You can calculate euclidian distance in 3D space between joints by simply using pythagorean theorem.

JJLimmm commented 1 year ago

You can use body_frame.draw_bodies(color_image, pykinect.K4A_CALIBRATION_TYPE_COLOR to draw the skeleton into your image. You can see it in an example here.

The keypoint informatino is in mm because Microsoft decided it should be mm when writing their SDK ;) You can calculate euclidian distance in 3D space between joints by simply using pythagorean theorem.

Hi @jonas-hurst , thanks for the reply.

I managed to figure out the drawing portion, but now i am having some issues with converting 2d points to 3d coordinate space. The joint information retreived from the body frames showed the x,y,z, wxyz, confidence for every joint. However, if i try to use a 2D coordinate from the RGB image and try to retrieve the corresponding 3D coordinates in the depth 3D space (to match the one used by the body tracking SDK), it keeps showing the error Argument 3: Error Wrong Type when i call the API convert_2d_to_3d. There might be some errors in the usage of that API, could you show a snippet of the code you used to call this API? and also the output is a k4a_float_3d_t object, is there any way to convert it to a python variable?

Thanks!

ibaiGorordo commented 1 year ago

Should work with https://github.com/ibaiGorordo/pyKinectAzure/issues/93#issuecomment-1423760861

JJLimmm commented 1 year ago

Thank You! I will close this issue now!

ibaiGorordo commented 1 year ago

Replaying it here, from: https://github.com/ibaiGorordo/pyKinectAzure/issues/93#issuecomment-1423765398

Hi @ibaiGorordo , thank you so much for showing how to call this API correctly. i have a question based on your second option regarding transforming the 3D point cloud into color image coordinate space. For the Body Tracking Joints, is that being inferred on the 3D Depth image? and is this 3D depth image same as the Point Cloud? i am slightly confused from the online Azure documentation.

Thank you!

Actually, I wrote my previous question without giving it much thought to your problem. In your case, why are you trying to convert the 2D RGB coordinates back into 3D, instead of using the 3D skeleton using body_frame.get_body()?

The 3D skeleton coordinate system is based on the depth sensor origin, do you need the 3D coordinates relative to the RGB camera? Then, doing the 2D to 3D transformation would be necessary.

JJLimmm commented 1 year ago

Replaying it here, from: #93 (comment)

Hi @ibaiGorordo , thank you so much for showing how to call this API correctly. i have a question based on your second option regarding transforming the 3D point cloud into color image coordinate space. For the Body Tracking Joints, is that being inferred on the 3D Depth image? and is this 3D depth image same as the Point Cloud? i am slightly confused from the online Azure documentation. Thank you!

Actually, I wrote my previous question without giving it much thought to your problem. In your case, why are you trying to convert the 2D RGB coordinates back into 3D, instead of using the 3D skeleton using body_frame.get_body()?

The 3D skeleton coordinate system is based on the depth sensor origin, do you need the 3D coordinates relative to the RGB camera? Then, doing the 2D to 3D transformation would be necessary.

Hi @ibaiGorordo , my bad i should have provided more context on why i am doing so. I am planning to use another Pose estimation Model (not the one used in the Body Tracking SDK). The RGB image from the Azure kinect will be fed into this other model and i will get the 2D keypoints based on that RGB image. However, i would also want to use the depth information at those keypoints, hence i will be converting those 2D keypoints to the 3D Depth image coordinate space to be able to retrieve the depth value at the corresponding coordinates.

ibaiGorordo commented 1 year ago

I understand. Then you can use the example I just uploaded. However, if you want to compare the 3D coordinates with the 3D pose from the Kinect (which is relative to the depth camera), you can convert the 2D points to 3D but using the Depth camera as the destination (I see some weird results, so test it because I am not sure if it is 100% correct):

import sys
import cv2

sys.path.insert(1, '../')
import pykinect_azure as pykinect
from pykinect_azure import K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_DEPTH, k4a_float2_t

if __name__ == "__main__":

    # Initialize the library, if the library is not found, add the library path as argument
    pykinect.initialize_libraries()

    # Modify camera configuration
    device_config = pykinect.default_configuration
    device_config.color_format = pykinect.K4A_IMAGE_FORMAT_COLOR_BGRA32
    device_config.color_resolution = pykinect.K4A_COLOR_RESOLUTION_720P
    device_config.depth_mode = pykinect.K4A_DEPTH_MODE_WFOV_2X2BINNED
    # print(device_config)

    # Start device
    device = pykinect.start_device(config=device_config)

    cv2.namedWindow('Transformed Color Image',cv2.WINDOW_NORMAL)
    while True:

        # Get capture
        capture = device.update()

        # Get the color image from the capture
        ret_color, color_image = capture.get_color_image()

        # Get the colored depth
        ret_depth, transformed_depth_image = capture.get_transformed_depth_image()

        if not ret_color or not ret_depth:
            continue

        pix_x = 100
        pix_y = 200
        depth = transformed_depth_image[pix_y, pix_x]
        print(depth)

        pixels = k4a_float2_t()
        pixels.xy.x = pix_x
        pixels.xy.y = pix_y

        pos3d = device.calibration.convert_2d_to_3d(pixels, depth, K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_DEPTH)
        print(pos3d.xyz.x, pos3d.xyz.y, pos3d.xyz.z)

        # Overlay body segmentation on depth image
        cv2.imshow('Transformed Color Image',color_image)

        # Press q key to stop
        if cv2.waitKey(1) == ord('q'):
            break

JJLimmm commented 1 year ago

I understand. Then you can use the example I just uploaded. However, if you want to compare the 3D coordinates with the 3D pose from the Kinect (which is relative to the depth camera), you can convert the 2D points to 3D but using the Depth camera as the destination (I see some weird results, so test it because I am not sure if it is 100% correct):

import sys
import cv2

sys.path.insert(1, '../')
import pykinect_azure as pykinect
from pykinect_azure import K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_DEPTH, k4a_float2_t

if __name__ == "__main__":

  # Initialize the library, if the library is not found, add the library path as argument
  pykinect.initialize_libraries()

  # Modify camera configuration
  device_config = pykinect.default_configuration
  device_config.color_format = pykinect.K4A_IMAGE_FORMAT_COLOR_BGRA32
  device_config.color_resolution = pykinect.K4A_COLOR_RESOLUTION_720P
  device_config.depth_mode = pykinect.K4A_DEPTH_MODE_WFOV_2X2BINNED
  # print(device_config)

  # Start device
  device = pykinect.start_device(config=device_config)

  cv2.namedWindow('Transformed Color Image',cv2.WINDOW_NORMAL)
  while True:

      # Get capture
      capture = device.update()

      # Get the color image from the capture
      ret_color, color_image = capture.get_color_image()

      # Get the colored depth
      ret_depth, transformed_depth_image = capture.get_transformed_depth_image()

      if not ret_color or not ret_depth:
          continue

      pix_x = 100
      pix_y = 200
      depth = transformed_depth_image[pix_y, pix_x]
      print(depth)

      pixels = k4a_float2_t()
      pixels.xy.x = pix_x
      pixels.xy.y = pix_y

      pos3d = device.calibration.convert_2d_to_3d(pixels, depth, K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_DEPTH)
      print(pos3d.xyz.x, pos3d.xyz.y, pos3d.xyz.z)

      # Overlay body segmentation on depth image
      cv2.imshow('Transformed Color Image',color_image)

      # Press q key to stop
      if cv2.waitKey(1) == ord('q'):
          break

Correct me if i'm wrong. The Depth sensor outputs the 2D Depth image and the RGB camera outputs the 2D RGB image. Is there a way to get the 3D Depth image from the API's directly? Or do i have to transform the 2D depth image to 3D depth image? If so, how do i transform it?

ibaiGorordo commented 1 year ago

You can use the point cloud example: https://github.com/ibaiGorordo/pyKinectAzure/blob/master/examples/examplePointCloud.py

But the issue is that that point cloud is only relative to the Depth sensor. It is possible to transform the 3D point cloud to the RGV camera coordinate system (what I mentioned in the second solution), but I haven't implemented it yet.

You could also transform the 2D coordinates you get from your pose estimation model, from the 2D RGB coordinates, into the depth camera 2D coordinates, and then use those 2D coordinates to get the 3D data from the poincloud.

The easiest option probably is to implement the 3D pointcloud transformation, but not sure when I will be able to do that.

JJLimmm commented 1 year ago

i see, i understand what you meant. i will try out converting from the 2D coordinates from the 2D RGB image to the 2D coordinates in the 2D Depth image. Then using those 2D coordinates and depth value in the 2D Depth image, i will convert both the depth image and coordinates to the 3D Depth image (Pointcloud).

Just on a sidenote, do you know if using the depth directly from the 2D depth map is the same as the depth from the 3D depth map? If it is the same, i will only need to convert the 2D coordinates from RGB image to 2D depth to be sufficient for what i need.

ibaiGorordo commented 1 year ago

Yes, if you do that, you can actually use the pointcloud directly instead of using the 2D depthmap to get the depth:

points_3d_map = points.reshape(depth_image.shape[0],depth_image.shape[1],3)
pixel_x = 100
pixel_y = 100
point_3d = points_3d_map[pixel_y,pixel_x,:]

JJLimmm commented 1 year ago

ah i see, okay. So overall the steps i would need to do would be

Convert 2D RGB coordinates (pixels) to 2D depth image coordinates (mm)
from capture object retrieve pointcloud data by calling capture.get_pointcloud()
Reshape point cloud map to 2D depth image format with the code block u written above.
Get the depth at the 2D depth coordinates we found previously by just indexing it from the reshaped pointcloud map

Is this correct?

for converting from 2D RGB coordinates to 2D depth coordinates, which API do i use? depth_2D_coords = device.calibration.convert_color_2d_to_depth_2d( (RGB_pixel_x, RGB_pixel_y), depth_image) OR depth_2D_coords = device.calibration.convert_2d_to_2d( (RGB_pixel_x, RGB_pixel_y), source_depth, K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_DEPTH)

JJLimmm commented 1 year ago

I tested out these steps with using the .convert_2d_to_2d() API and i got these results. (testing with the Spinal Chest keypoint)

body_kps is the x, y, z coordinates for the Spinal Chest point directly retrieved from body_frame.get_bodies().numpy()
joint_xy is the x y coords in RGB space. (For now i am using the 2D keypoints from get_body2d() for the Spinal Chest Keypoint)
Depth Value at converted coords is the Depth value by indexing the converted coords on the 2D Depth Image.
3D point with converted 2D depth coords is the xyz by indexing with the converted coords on the reshaped 3D point cloud.

As you can see, the SDK's depth value for z is 715.69873047 but the one obtained through conversion and indexing was 624. Is there anywhere i have done wrong?

This is the snippet of the code i used.

        ret_d, depth_image = capture.get_depth_image()
        ret_td, trans_depth_image= capture.get_transformed_depth_image()
        ret_pcd, pointcloud = capture.get_pointcloud()
        print(f"Size of depth image: {depth_image.shape}")
        print(f"Size of Trans Depth image: {trans_depth_image.shape}")
        pointcloud_map = pointcloud.reshape(depth_image.shape[0],depth_image.shape[1],3)
        print(f"Size of 3D depth image: {pointcloud_map.shape}")

        if ret:
            bodies = body_frame.get_bodies()
            body_list = []
            spinal_chest_id = [2]
            coords_2D = k4a_float2_t()
            if len(bodies) >= 1:
                for id, body in enumerate(bodies):
                    body_kps = body.numpy()
                    print(f"body_kps: {body_kps[2,:3]}")

                    body_2dkps = body_frame.get_body2d(id, pykinect.K4A_CALIBRATION_TYPE_COLOR)
                    # body_3Dkps = []
                    for jointID, joint in enumerate(body_2dkps.joints):
                        if jointID == 2:
                            joint_xy = joint.get_coordinates()
                            print(f"joint xy: {joint_xy[0]},{joint_xy[1]}")
                            coords_2D.xy.x = joint_xy[0]
                            coords_2D.xy.y = joint_xy[1]
                            trans_depth_value = trans_depth_image[joint_xy[1], joint_xy[0]]
                            depth_2D_coords = device.calibration.convert_2d_to_2d(coords_2D, trans_depth_value, K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_DEPTH)
                            print(f"Converted RGB to Depth Coords [{int(depth_2D_coords.xy.x)}, {int(depth_2D_coords.xy.y)}]")
                            depth_value = depth_image[int(depth_2D_coords.xy.y),int(depth_2D_coords.xy.x)]
                            print("Depth Value at converted coords: ", depth_value)
                            point_3d = pointcloud_map[int(depth_2D_coords.xy.y),int(depth_2D_coords.xy.x),:]
                            print(f"3D point with converted 2D depth coords: {point_3d}")

JJLimmm commented 1 year ago

@ibaiGorordo While exploring and digging around if anybody faced the same issue where the depth (z value) obtained is different from the SDK's Body Tracking keypoints and from doing our own conversion, i noticed in the documentation as well that Body Tracking uses the Depth and IR from the Capture object.

But they did not mention anything about how the data from these 2 sensors are used and passed onto the Body Tracker. That might be the key to understanding the difference between the depth values obtained from the Body Tracker and from getting it through other model and converting to the corresponding coordinate spaces.

Do you have any idea if combining the depth and IR image gives the correct image input for the body tracker? or if you have implemented a way to do so already that i might have missed out.

EDIT: Have mentioned this issue in another repo (pyk4a) for more help from the community as well.

ibaiGorordo commented 1 year ago

To make it a bit easier, I have modified the library a bit and I have uploaded an example to compare the skeleton values when converting the 2D skeleton (from the RGB and depth camera) to 3D with the 3D skeleton from the body tracker: https://github.com/ibaiGorordo/pyKinectAzure/blob/48240214b66a54b5607013cf4675d06a0efc7c1b/examples/exampleBodyTrackingTransformationComparison.py

I get that the depth difference is a couple of cm, in theory it should be reversible, i.e. converting the 3D pos to 2D in the depth image, and then convert that 2D coordinate to 3D should provide the original result. So I might have make a mistake somewhere (or maybe using the pointcloud is not a reliable option).

ibaiGorordo commented 1 year ago

I have also added the 2d To 3D transformation, it is slightly different, but just few millimetres compared to using the point cloud: https://github.com/ibaiGorordo/pyKinectAzure/blob/8a6709ef43ff584c04a21e730cbbfe07a6cce298/examples/exampleBodyTrackingTransformationComparison.py

ibaiGorordo commented 1 year ago

However, I was able to also get a different value between the 2D to 3D transformed joint and the skeleton joint. I think the issue is the smart logic the model uses to estimate the 3D position of the joints. It probably checks the rest of the joint 3D position to provide a 3D skeleton that is anatomically correct. However, when we use the 2D joint + depth, it does not work well because we are missing that logic.

You can test it by putting the hand in front of your neck, if you do that you will see that the depth of the transformed 3d pos is much closer (the distance to the hand), than the skeleton 3D neck, which is not affected by the occlusion.

JJLimmm commented 1 year ago

However, I was able to also get a different value between the 2D to 3D transformed joint and the skeleton joint. I think the issue is the smart logic the model uses to estimate the 3D position of the joints. It probably checks the rest of the joint 3D position to provide a 3D skeleton that is anatomically correct. However, when we use the 2D joint + depth, it does not work well because we are missing that logic.

You can test it by putting the hand in front of your neck, if you do that you will see that the depth of the transformed 3d pos is much closer (the distance to the hand), than the skeleton 3D neck, which is not affected by the occlusion.

Thank you for spending time to investigate this issue with me. Is the smart logic you mentioned the model is using written in any of the Azure's Documentation? Are you able to share a link to it please?

Or is it because they are using Depth + Infrared image combined to do the estimation? From their Body Tracking SDK they mentioned the first step after getting the Capture object is to retrieve the Depth and IR image.

ibaiGorordo commented 1 year ago

No, that is just my guess after doing the test with my hand. I haven't found any info about the model. If you check the ONNX file in the bin folder of the Body Tracker SDK you can see the output of the model, and it seems that the model does not provide the 3D pose directly, and it seems that some postprocessing has to happen in the SDK that we don't have access to...

JJLimmm commented 1 year ago

Or is it because they are using Depth + Infrared image combined to do the estimation? From their Body Tracking SDK they mentioned the first step after getting the Capture object is to retrieve the Depth and IR image.

Hmm... i went to view the model as well and that seems most probable... what about the second point i mentioned? Have you come across it before?

ibaiGorordo commented 1 year ago

No, but the ONNX model has two inputs with the same shape, so it is probably the depth map and the IR image.

JJLimmm commented 1 year ago

@ibaiGorordo Are you able to screenshot your output from testing with the new script you created? i have gotten difference of about 10mm to 100mm across the different ways of conversion.

As you see in the screenshot above of my output (i rearranged your code just a little bit). Method 1,2,3 are just my naming conventions for the 3 different ways to convert the 2D keypoints to the same 3D point cloud space for comparison with the original SDK's body tracking. By looking at the x and y values only, the difference is about ~10mm which is okay. But looking at the z-value, the difference is about 100mm (== 10cm) which is quite a big difference... Did you also get similar results?

JJLimmm commented 1 year ago

While testing out with the transformed point cloud (3D RGB space), i will face this error once in a while. Seems like the 2D keypoints are out of the size of the image and pointcloud dimensions. Do you face the same issue as well?

ibaiGorordo commented 1 year ago

While testing out with the transformed point cloud (3D RGB space), i will face this error once in a while. Seems like the 2D keypoints are out of the size of the image and pointcloud dimensions. Do you face the same issue as well?

That is probably due to the difference in the field of view and distance between the cameras. Make sure to remove the points that are outside the image.

ibaiGorordo / pyKinectAzure

Human Pose #92