yohanshin / WHAM

MIT License
568 stars 59 forks source link

SLAM poses, trans_world and pose_world #80

Open AlexMorgand opened 3 months ago

AlexMorgand commented 3 months ago

Hi, first of all thank you for making all of your work available for us. It's greatly appreciated :)

I had a few questions regarding the parameters that we can from demo/api.

When analyzing the camera poses (slam_results.pth), I can see that the first pose is [ Id | 0]. Convention seems to be [x, y, z, qx, qy, qz, qw] right?

However, I'm trying to find the translation and position in the real world coordinate of the root according to the camera pose. It seems to me that this information is stored in 'trans_world' and 'pose_world' (by getting the root node) however loading them directly in blender I can see that they don't really match the camera poses given. The 'trans_world' information is really close to the origin.

Am I doing something wrong?

yohanshin commented 2 months ago

Hi, yes, when I ran DPVO, I also saved the data as [x, y, z, qx, qy, qz, qw].

Do you mean that trans_world (prediction from WHAM) is merely moving while x, y, z of SLAM results look reasonable? First of all, SLAM results lack the global scale, so it needs to be multiplied by some scalar. That's why WHAM's prediction does not necessarily match with SLAM output. Please let me know if I am misunderstanding your question.

AlexMorgand commented 2 months ago

Hey @yohanshin thank you for getting back to me!

What I mean is that the first pose of DPVO is at the origin (slam_results.pth) and it's the same for "trans_world" (from wham_output.pkl) so I was wondering if I need to manually apply a correction or something else to get the camera pose at the right place. What I need is to have the good camera pose looking at the bodies in world coordinate.

What I finally did on Blender was to use the information from the first pose from "trans" and "pose" to place (back projection) correctly the body according to the first pose of the camera while using the "trans_world" and "pose_world" for the the animation so it's in world coordinate.

It's working well but I've a problem in Z that I'm wondering if you're correcting it with something like

         verts_glob = global_output.vertices.cpu()
         verts_glob[..., 1] = verts_glob[..., 1] - verts_glob[..., 1].min()
         cx, cz = (verts_glob.mean(1).max(0)[0] + verts_glob.mean(1).min(0)[0])[[0, 2]] / 2.0

When projecting the mesh onto the video did you had to do this correction?

SLAM results lack the global scale, so it needs to be multiplied by some scalar

So we have an (expected) slight ambiguity on the human size and depth I suppose which could explain the misalignement sometimes. In my test cases, using my calibration file gave good results :)

that's a lot to ask but thank you already for your work!