isarandi / metrabs

Estimate absolute 3D human poses from RGB images.
https://arxiv.org/abs/2007.07227
MIT License
470 stars 69 forks source link

specifie the world coordinate #52

Closed luoww1992 closed 1 year ago

luoww1992 commented 1 year ago

i see the result pose3d: Each pose is shaped and is given in the 3D world coordinate system in millimeters (or in the camera coordinate frame, if is not specified).

so how to specifie the world coordinate ?

isarandi commented 1 year ago

You can use the argument extrinsic_matrix: a float32 Tensor of shape [4, 4], the camera extrinsic matrix, with millimeters as the unit of the translation vector. It's the matrix that transforms points from the world coordinate system to the camera coordinate system.

luoww1992 commented 1 year ago

@isarandi so if i set args: extrinsic_matrix in model.detect_poses(......), the result pose3d is in the camera coordinate, if the extrinsic_matrix is None, the result pose3d is in the woldcoordinate?

luoww1992 commented 1 year ago

@isarandi i also see a new job in the youtube about your new work ? it is beautiful ! have you a plan to update it or do something to optimize the model in inference ?

isarandi commented 1 year ago

so if i set args: extrinsic_matrix in model.detect_poses(......), the result pose3d is in the camera coordinate, if the extrinsic_matrix is None, the result pose3d is in the woldcoordinate?

If you do set the extrinsic_matrix, then the result will be in world coordinates. If you don't set it, then all we can provide is the camera-relative result, so it will be in camera coordinates.

have you a plan to update it or do something to optimize the model in inference ?

I'm currently working on pushing an update to this repo, with some refactoring and better dependency handling and easier-to-use scripts. I also plan to look into inference optimization like TF-lite. Usually, for best speed one needs to pre-specify image sizes and other tensor shapes etc. This takes away some of the flexibility and ease of use of the current API. I'll look into these things a bit later.

luoww1992 commented 1 year ago

@isarandi now i use the default func: model.detect_poses_batched(images) with default args, the extrinsic_matrix is eys(4) by model: metrabs_eff2l_y4 to inference a video. so the result should be in world coordinates like you say. this is my poses3d file with smpl_24 skt. pose3d.zip when i plot it with matplotlib, i find the skt is lying down, but in poseviz is stand. this is my show code:

def main(file): index1 = [0, 1, 4, 7, 10] index2 = [0, 2, 5, 8, 11] index3 = [0, 3, 6, 9, 12, 15] index4 = [9, 13, 16, 18, 20, 22] index5 = [9, 14, 17, 19, 21, 23]

ax = plt.axes(projection='3d')
positions = np.load(file, allow_pickle=True).reshape((-1, 24, 3))

for position in positions:
    line1 = position[index1]
    line2 = position[index2]
    line3 = position[index3]
    line4 = position[index4]
    line5 = position[index5]
    for line in [line1, line2, line3, line4, line5]:
        color = random.choice(['r', 'g', 'b'])
        x = line[:, 0]
        y = line[:, 1]
        z = line[:, 2]
        plt.plot(x, y, z, color=color)
    break

ax.set(xlabel='X',
       ylabel='Y',
       zlabel='Z',
       )
ax.set_title('3D line plot')
plt.show()
plt.savefig('smpl.jpg')

so i check the code: i find it use poses3d result, the main data processing is first by function:set_world_up() in mayavi_util.py then get mayavi pose by func mayavi_util.world_to_mayavi(pose) in mayavi_util.py. after use func: pointset.add_point() to plot point. it make a camera projection in the mayavi space, i see the show is good, so if we use the mayavi space to be the world, then how to get the world position in the mayavi space.?

isarandi commented 1 year ago

No, if you don't specify the extrinsics then it will be camera coordinates. The extrinsics describe the transformation from world to camera. If you don't specify it, we can't make predictions in world coordinates, only in camera coordinates.

So if you use the default, without specifying the extrinsic_matrix, you will get pose3d results in camera coordinates: x points to the right, y down, z forwards.

With Matplotlib you need to be careful, because it draws the Z axis as the vertical one (upwards), instead of Y. As I said, in the result poses the Y direction points downwards, following the standard convention. This is just a visualization thing.

See https://github.com/isarandi/metrabs/blob/master/demo.py#L55 for how to plot the poses with Matplotlib.

luoww1992 commented 1 year ago

@isarandi what about my other question: what i say before it creates a camera and makes a camera projection in the mayavi space, i see the show is good, so if we use the mayavi space to be the world, then how to get the world position in the mayavi space.?

isarandi commented 1 year ago

The "Mayavi space" concept is an internal implementation detail in PoseViz that is not important from the API-user's perspective.

Do you have an extrinsic matrix? If not, it makes no sense to talk about a world space, as we don't know how the camera is placed in the world.

luoww1992 commented 1 year ago

i know it makes no sense without extrinsic matrix, so for the internal implementation in PoseViz, if no extrinsic, make the PoseViz to be the 'world space', then use the projection point to be the ‘world’ point. is it a alternative method ?

isarandi commented 1 year ago

If you don't set the extrinsic matrix, then the world and camera spaces are equal. Set the extrinsic matrix if you want to have a world space that's different from camera space. In that case, also set 'world_up' in detect_poses and in the PoseViz constructor.

make the PoseViz to be the 'world space

PoseViz is a visualizer, it does not define its own world space.

luoww1992 commented 1 year ago

hello, long time no see. i have got the 3d point in real world space with extrinsic matrix.

now, the cam is fixed, without moving. then if i have no the extrinsic, i want to get the 3d point in pseudoGT space in poseviz. like: the viz.png.

1, get the pred['poses3d'] by model.detect_poses(). 2, use the default code in demo.py to show the result: use camera=cameralib.Camera.from_fov(55, frame.shape[:2]) 3, i see the step in class SkeletonsViz.update(): mayavi_poses = [poseviz.mayavi_util.world_to_mayavi(pose) for pose in poses] then i show the mayavi_poses by func see_plt3d():

def see_plot3d(joints, title=None): joints = joints.copy() fig = plt.figure(figsize=[100, 100]) ax = fig.add_subplot(111, projection='3d') index1 = [0, 1, 4, 7, 10] index2 = [0, 2, 5, 8, 11] index3 = [0, 3, 6, 9, 12, 15] index4 = [9, 13, 16, 18, 20, 22] index5 = [9, 14, 17, 19, 21, 23] for idx in [index1, index2, index3, index4, index5]: jt = joints[idx] plt.plot(jt[:, 0], jt[:, 1], jt[:, 2]) plt.title(title) ax.set_xlabel('x') ax.set_ylabel('y') ax.set_zlabel('z') plt.legend(labels=['body']) ax.view_init(-90, -90) plt.show()

the pelvis is origin. the foot is on ground in poseviz space, but show it by func see_plt3d(), the result is changed。 So, i want to show it likes result shows in poseviz space, so how to get the pseudo 3D point in poseviz space ?

luoww1992 commented 1 year ago

i only want to get the pseudo/realistic 3D point of the ground extracted by camera projection parameters in poseviz visualization space.