microsoft / voxelpose-pytorch

Official implementation of "VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment"
MIT License
480 stars 90 forks source link

Campus space center , space size #48

Closed gpastal24 closed 5 months ago

gpastal24 commented 1 year ago

Hello I know this has been answered before but I don't get how you define the space center. Let me explain. In campus dataset the x,z coords in the original dataset are the following. (-4.9,11.2),(-1.78,5.22) and (4.9,6.68) in meters. I don't quite understand how you define a 12x12m box around these coords. Also the, space center should be around (0,8) (meters). Could you please enlighten me ?

Same is the case if I instead get the coordinates from the -dot(R.T,T). The coordinates are the following: (-6.2,5.2) , (1.77,-5.05), (11.7,-1.8), I still can't see how the bounding box should be 12x12 and the space center 3,4.5. Thanks.

mateuszk098 commented 1 month ago

Hi, I have the same problem. Have you solved this? I see they say that SPACE_CENTER denotes the center coordinate of the box. However, this is not the case when dealing with Campus and Panoptic datasets, where they set some values I cannot understand. Currently I'm dealing with AIST++ dataset and I got some nice reconstruction results, but I can probably make them better knowing how to properly define the space.

gpastal24 commented 3 weeks ago

@mateuszk098 Hi, I do not remember if I figured it for campus dataset. Nonetheless, on my own dataset FasterVoxelPose works well, when I set the space center to be the half of the region defined by the camera placements.

mateuszk098 commented 3 weeks ago

@gpastal24 Hi, thank you for the response. By region defined by camera placements, do you mean calculating each camera position using $-R^TT$ and then the average? I'm completely new in this field and I lack intuition, but let me introduce what I'm doing here. I have two cameras and I calibrated them separately using cv2.calibrateCamera() and a chessboard with a 25 x 25 mm pattern. This gave me reasonable values of fx, fy, cx, cy and distortions (intrinsic values). Then I used cv2.stereoCalibrate() to find camera extrinsics, and this gave me rotation matrix R and translation vector T. As far as I understand, these R and T define where the second camera is placed relative to the first one. So these values correspond to the second camera, and I set them in the calibration.json file. But I don't know what should I set for R and T for the first camera. What is the origin point? In this case, I thought the origin point was the first camera, so I set the identity matrix as R and the zero-vector as T. Is that right or I'm wrong? Nevertheless, there is no fused poses on the output. Probably, it's the problem with joint detection by ResNet, and I have to change the view, but I want to be sure whether my considerations about calibration are fine.

gpastal24 commented 3 weeks ago

@mateuszk098 this thread might be useful https://github.com/AlvinYH/Faster-VoxelPose/issues/21 . In essence what I did was to calibrate all the cameras to a common reference point. The results you get from cv2 calibration are World to Camera (I don't know the physical meaning of this tbh). You put these R,T matrices in the calibration file. You are correct, you use -dot(R.T,T) to find the camera position in 3D space. From the camera positions you can then define the 3D bounding box encompassing the cameras as well as the space center