mks0601 / V2V-PoseNet_RELEASE

Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
https://arxiv.org/abs/1711.07399
MIT License
377 stars 69 forks source link

How to get exact 3D coordinates from voxelized outputs #66

Closed siyuan-peng closed 2 years ago

siyuan-peng commented 2 years ago

The output of V2V is a voxelized heatmap cube for each keypoint, but in real applications we want to predict the exact locations of each keypoint.

How do we get 3D coordinates in world coordinate from V2V voxelized outputs? Is it just simply finding the voxel with largest confidence score, warping the voxel center to world coordinates and treating this point as the final location for this keypoint?

mks0601 commented 2 years ago

Q. Is it just simply finding the voxel with largest confidence score, warping the voxel center to world coordinates and treating this point as the final location for this keypoint? A. Yes

See this

siyuan-peng commented 2 years ago

Q. Is it just simply finding the voxel with largest confidence score, warping the voxel center to world coordinates and treating this point as the final location for this keypoint? A. Yes

See this

Thank you for your quick response!!

Will this result in a loss in precision? Since now the final output world locations can only lie on voxel centers, there is an inevitable loss which depends on the voxel size.

Is there any way to improve this? I'm thinking of taking the "center of mass" of the voxelized 3D heatmaps, but I'm not sure whether this will lead to better results.

mks0601 commented 2 years ago

We observed that the effect of the loss in precision is marginal. Instead, you can use soft-argmax.