I checked the code and found that the SH 2D detections are pre-processed and saved result.
According to paper, this approach is estimating from 2D joints in camera coordinates to the depth which suppose to have mm units.
However, SH output is in image space in pixels. Inverse camera projection requests depth for exact recovery.
However, depth is not suppose to be available at runtime.
Could you clarify how you got the 2D joints ( mm in camera coordinates) from image space? Did you employ the ground truth depth data?
Thanks for your interest in our research!
I checked the code and found that the SH 2D detections are pre-processed and saved result. According to paper, this approach is estimating from 2D joints in camera coordinates to the depth which suppose to have mm units.
However, SH output is in image space in pixels. Inverse camera projection requests depth for exact recovery. However, depth is not suppose to be available at runtime.
Could you clarify how you got the 2D joints ( mm in camera coordinates) from image space? Did you employ the ground truth depth data?