mkocabas / VIBE

Official implementation of CVPR2020 paper "VIBE: Video Inference for Human Body Pose and Shape Estimation"
https://arxiv.org/abs/1912.05656
Other
2.85k stars 551 forks source link

Converting weak perspective camera parameters to Blender camera projection models #278

Open MariusKM opened 1 year ago

MariusKM commented 1 year ago

I am currently trying to import and parse the VIBE outputs into Blender. The goal is to overlay the predicted mesh/armature on the original image and render it in blender, similar to the visualization that is output by pyrenderer.

The biggest issue that I am having currently is understanding how to convert the orig_cam output parameters to a standard (orthographic/perspective) projection model.

Does anyone know how the sx and sy values correlates to either perspective focal length or orthographic scale? What does a sx sy value of 1 represent? Would this mean that the bounding box of the tracked body is the same size as the image?

I have checked out several other issues dealing with a similar topic ( like this one) but they all seem to be using renderers that work in NDI or image space. Blender does not work in NDI space ( meaning, a tx or ty shift of 1 does not bring the camera center to the edge of the image). Would a valid solution be to try and convert all transforms and translations in blender to screen space ( currently not supported, would require some work to create this functionality)?

Also while using the suggested conversion to a perspective camera, an unrealistically high focal length (500+, since this is required to construct a weak perspective) is assumed to calculate the z translation. This focal length is unusable for any rendering purpose, since anything rendered with this focal length would appear unrealistically close and fills the whole image. Using a regular focal length yields very small translations on the z axis which also do not seem to correspond with the scale/ translation of the body in the original image. Is there a way to convert the weak perspective translation with a usable focal length?

I have also opened a stack question if anyone wants more details/examples of my code.

I would be grateful for any help or insights regarding this issue.

Lovelyduog commented 1 year ago

Have you solved this problem?