microsoft / Azure-Kinect-Sensor-SDK

A cross platform (Linux and Windows) user mode SDK to read data from your Azure Kinect device.
https://Azure.com/Kinect
MIT License
1.5k stars 619 forks source link

Converting joint positions to pixelvalues #1572

Closed pingopalino1 closed 3 years ago

pingopalino1 commented 3 years ago

Hello, I do not have an issue but more a question. I still hope this is the right place to ask those, if not please let me know

I use the offline_processor sample to extract the body joint positions from a mkv File. After that I want to compare the detected points with annotated shoulder data. For that I use Python.

I try to match the joint points of the azure kinect body tracking Algorithm to the depth image. The thing is, that the coordinates of the joints are in x,y,z. So far I tried to match it like so: I use NFOV unbinned mode with a resolution of 640x576 and a angle $$\alpha$$ of 75x65

1) Get distance to joint jointz

2) calculate the pixel resolution for every axis of that distance in mm/pixel. With this formula: formel1

3) Transform jointx and jointy from milimeters to pixelvalues according to factor calculated in step 2. 4) Transform the coordinates from Azure Kinect coordinate system to picture coordinate system like so: pixelcount

5) plot the points.

The coordinate system of the azure kinect has the origin at the center of the frame. My image has the origin at the top-left corner of the image. So the transformation in step 4 should be correct.

Also according to the official documentation the joint position and orientation are estimates relative to the global depth sensor frame of reference. That means it uses the same coordinate system like the depth sensor right?

The coordinate system from the depth sensor is also tilted down 6° and if I correct for half of that I get better results but it does feel arbitrary. Do I have to correct for that tilt?

Am I missing something important? Picture two is taken from the simple sample programm and if you compare it with my picture it becomes evident that my coordinates are not correct.

Any help is greatly appreciated. Thanks

EDIT: Now i tried to implement the convert_3d_to_2d option from https://github.com/etiennedub/pyk4a but I could not get it to work. If someone has input on that side I would also appreciate it.

my_program simple_viewer

diablodale commented 3 years ago

Hi. The kinect sdk provides apis that can do this https://docs.microsoft.com/en-us/azure/kinect-dk/use-calibration-functions#convert-between-2d-and-3d-coordinate-systems And since it is open source, you can see the code and algorithms that those API use in this git repo.

pingopalino1 commented 3 years ago

Tanks alot.

deeprine commented 2 years ago

Hello, I do not have an issue but more a question. I still hope this is the right place to ask those, if not please let me know

I use the offline_processor sample to extract the body joint positions from a mkv File. After that I want to compare the detected points with annotated shoulder data. For that I use Python.

I try to match the joint points of the azure kinect body tracking Algorithm to the depth image. The thing is, that the coordinates of the joints are in x,y,z. So far I tried to match it like so: I use NFOV unbinned mode with a resolution of 640x576 and a angle α of 75x65

  1. Get distance to joint jointz

       [
    
           ![jointz](https://user-images.githubusercontent.com/82642606/114997318-fb6ac800-9e9f-11eb-9871-7f5aeed35ae3.gif)
         ](https://user-images.githubusercontent.com/82642606/114997318-fb6ac800-9e9f-11eb-9871-7f5aeed35ae3.gif)
    
         [
    
         ](https://user-images.githubusercontent.com/82642606/114997318-fb6ac800-9e9f-11eb-9871-7f5aeed35ae3.gif)
  2. calculate the pixel resolution for every axis of that distance in mm/pixel. With this formula: formel1

       [
    
           ![formel1](https://user-images.githubusercontent.com/82642606/114997114-ceb6b080-9e9f-11eb-93a4-723f749011e1.gif)
         ](https://user-images.githubusercontent.com/82642606/114997114-ceb6b080-9e9f-11eb-93a4-723f749011e1.gif)
    
         [
    
         ](https://user-images.githubusercontent.com/82642606/114997114-ceb6b080-9e9f-11eb-93a4-723f749011e1.gif)
  3. Transform jointx

       [
    
           ![jointx](https://user-images.githubusercontent.com/82642606/114997354-06255d00-9ea0-11eb-8554-98dc6d6ab62e.gif)
         ](https://user-images.githubusercontent.com/82642606/114997354-06255d00-9ea0-11eb-8554-98dc6d6ab62e.gif)
    
         [
    
         ](https://user-images.githubusercontent.com/82642606/114997354-06255d00-9ea0-11eb-8554-98dc6d6ab62e.gif)
    
       and  ![jointy](https://user-images.githubusercontent.com/82642606/114997402-14737900-9ea0-11eb-9389-360f85537acf.gif)
    
       [
    
           ![jointy](https://user-images.githubusercontent.com/82642606/114997402-14737900-9ea0-11eb-9389-360f85537acf.gif)
         ](https://user-images.githubusercontent.com/82642606/114997402-14737900-9ea0-11eb-9389-360f85537acf.gif)
    
         [
    
         ](https://user-images.githubusercontent.com/82642606/114997402-14737900-9ea0-11eb-9389-360f85537acf.gif)
    
       from milimeters to pixelvalues according to factor calculated in step 2.
  4. Transform the coordinates from Azure Kinect coordinate system to picture coordinate system like so: pixelcount

       [
    
           ![pixelcount](https://user-images.githubusercontent.com/82642606/114997617-56042400-9ea0-11eb-8a45-f349920131ef.gif)
         ](https://user-images.githubusercontent.com/82642606/114997617-56042400-9ea0-11eb-8a45-f349920131ef.gif)
    
         [
    
         ](https://user-images.githubusercontent.com/82642606/114997617-56042400-9ea0-11eb-8a45-f349920131ef.gif)
  5. plot the points.

The coordinate system of the azure kinect has the origin at the center of the frame. My image has the origin at the top-left corner of the image. So the transformation in step 4 should be correct.

Also according to the official documentation the joint position and orientation are estimates relative to the global depth sensor frame of reference. That means it uses the same coordinate system like the depth sensor right?

The coordinate system from the depth sensor is also tilted down 6° and if I correct for half of that I get better results but it does feel arbitrary. Do I have to correct for that tilt?

Am I missing something important? Picture two is taken from the simple sample programm and if you compare it with my picture it becomes evident that my coordinates are not correct.

Any help is greatly appreciated. Thanks

EDIT: Now i tried to implement the convert_3d_to_2d option from https://github.com/etiennedub/pyk4a but I could not get it to work. If someone has input on that side I would also appreciate it.

my_program simple_viewer

hi. pingopalino1 Can I get an "offline_processor" written in python from you? thank you