Closed gh18l closed 5 years ago
what do you mean by misplaced? keypoints_img
contains 2D image coordinates. keypoints_cam
contains camera-centered 3D coordinates.
@mks0601 Thank you for your reply. For example, the keypoints_img is (x1, y1), keypoints_cam is (x2, y2, z2). The 3D coordination with image-centered(camera-centered point multiplies intrinsic matrix) is (x1, y1, z2). I picked pairs(TS3-000000.jpg) and I found the three persons in the image have different depth in Pelvis joint, but the data z2 with image-centered in Pelvis joint is the same almost.
The image projection (x2,y2,z2) -> (x1,y1) is effected by z2. Could you check the projected (x1,y1) shows valid 2D pose on the image? Then, there is no problem. I didn't do anything from provided original dataset, but just converted it to .json
MS COCO format.
@mks0601 Thank you. The number of persons in MUPOTS is not fixed, but I ignored it.
Can I ask you another question? If the size and position of the person in images of MUCO and MUPOTS is same absolutely, the x1 or y1 of image-centered human joint(x1,y1,z2) in MUCO and MUPOTS is equal obviously, but is it equal about z2 in MUCO and MUPOTS? I mean can I evaluate the MUPOTS joint using image-centered point(x1,y1,z2) while training in MUCO using image-centered point(x1,y1,z2)(using absolute depth z2 that don't minus root)?
I cannot clearly understand your question, but let me explain my answer based on my guess.
If the size and position of the person in images of different dataset, then image-centered human joints (x,y) would be the same? Ans: No. It can also depends on image resolution and poses that humans are performing.
If the size and position of the person in images of different dataset, then distance between camera and each joints would be the same? Ans: No. It can also depends on focal length and poses that humans are performing.
Thank you for your reply. Suppose I have the image-centered and camera centered joint which your link for MUCO and MUPOTS datasets offer, can you tell me how can I evaluate the MUPOTS while training with MUCO(input:2D joint-->output:3D joint)?
I have tried using image-centered joint(x,y,z) directly(training input:2D image-centered joint(x,y)-->training output:3D image-centered joint(x,y,z)---------evaluating input:2D image-centered joint(x,y)-->evaluating output:3D image-centered joint(x,y,z), abviously it only predict z), but I found it seems wrong. The 2D inputs have been normalized into [-1, 1] with formulation-->(x,y)/img_width *2 - (1.0,img_height/img_width) to eliminate the influence of different resolution.
Thank you!
Sorry for my ambiguous question...I just want to ask:
Thank you very much!
annot3
as camera-centered coordinates.@mks0601 I mean the two datasets have some gaps, such as the different intrinsic matrix. What should my system input and output? Just like annot3
[:2] as input, annot3
as output, or img_coordination as input, annot3
as output, or other extra processing?
note: I have tried the former two modes, but it doesn't seem to work.....
Most of 2D-3D pose lifting methods uses 2D image coordinate as input and 3D root-relative camera-centered coordinates as groundtruth (target). For more detailed things, you'd better read multi-view geometry for computer vision textbook.
@mks0601 But if the intrinsic matrix is different, suppose there are two same 2D image points, it will get different outputs with the same network.
Let's say there is a person in 3D space and capture him using two different camera (with different intrinsic matrix, same extrinsic matrix). As extrinsic matrix is the same, camera-centered 3D coordinates of his keypoints would be the same. As intrinsic matrix is different, their 2D pose in image space would be different (mainly, scale). Then, you can crop the 2D pose using bounding box and after that, marginal difference can be ignored.
@mks0601 Thank you, I will try it.
I download the annotation file about MUPOTS dataset from your link, the 2D coordination is in keypoints_img, and the corresponding x and y of 3D is the same as keypoints_img, the z comes from keypoints_cam-z. But I found the z is misplaced with real depth position in the corresponding image. Are there any extra steps to do?