karfly / learnable-triangulation-pytorch

This repository is an official PyTorch implementation of the paper "Learnable Triangulation of Human Pose" (ICCV 2019, oral). Proposed method archives state-of-the-art results in multi-view 3D human pose estimation!
MIT License
1.1k stars 181 forks source link

Understanding the Extrinsic Parameters of Human3.6M #110

Closed wenwu35 closed 4 years ago

wenwu35 commented 4 years ago

Hi all,

I am trying to build my own multi-camera system similar to Human3.6M's as suggested here.

However, I am confused with the extrinsic parameters of the Human3.6M. I first thought the extrinsics (R, t) of each of the 4 cameras pointed to a single world coordinate center, but it didn't seem to be right after I reconstructed the cameras views using Human3.6M's extrinsic parameters (image below) - the cameras were not placed at the 4 corners of the capture space as described by Lab Setup.

So I am just wondering how to set the extrinsic for my own testing rig if I'd like to use your method? Much appreciated!

image.

karfly commented 4 years ago

Hi, @mobiusww. The process of estimating extrinsic and intrinsic parameters of the multi-camera setup is called calibration. Usually cameras are calibrated with visual markers (e.g. chess pattern or Aruco).

A good starting point is this OpenCV tutorial: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_calib3d/py_calibration/py_calibration.html.

wenwu35 commented 4 years ago

Hi, @mobiusww. The process of estimating extrinsic and intrinsic parameters of the multi-camera setup is called calibration. Usually cameras are calibrated with visual markers (e.g. chess pattern or Aruco).

A good starting point is this OpenCV tutorial: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_calib3d/py_calibration/py_calibration.html.

Hi @karfly, many thanks for your prompt reply!

My understanding of the extrinsic parameter is that it describes the camera's position and orientation in a global coordinate. So my question was whether the 4 cameras of Human3.6M dataset share a single global coordinate? If so, I suppose the reconstructed 4 cameras would be placed at four corners and facing middle space, but it doesn't seem so from my reconstruction (image above); if not, how should I choose the coordinate centre to set the extrinsics (i.e. the value of extrinsics depends on the coordinate centre/orientation, or the chess pattern's position), in order to use your model? Could I place the chess pattern randomly for each camera and set the extrinsics accordingly?

Thanks again!