Point triangulation solving

karfly / learnable-triangulation-pytorch

This repository is an official PyTorch implementation of the paper "Learnable Triangulation of Human Pose" (ICCV 2019, oral). Proposed method archives state-of-the-art results in multi-view 3D human pose estimation!

MIT License

1.1k stars 181 forks source link

Point triangulation solving #108

Open nisace opened 4 years ago

nisace commented 4 years ago

Hi all,

I'm trying to use the triangulation method on my own dataset but I struggle to understand how can the third component of the 2D homogeneous keypoints be ignored.

The triangulation problem aims to solve the equation AX=0 where A is constructed from the known projection matrices P and the image coordinates of the 2D keypoints x.

The matrix A is constructed from the fact that we want X such that x ^ PX = 0 for ^ denotes the cross product between two vectors.

As far as I understand, x=[u, v, w] where u and v are the image coordinates of the 2D keypoints but I don't understand how to get the value of w.

From the code, I understand that w is assumed to be equal to 1 but in that case, u and v are only known up to a factor from the 2D keypoints, is that correct? Is so, how can I consider w as an additional unknown and solve for it alongside with the 3D keypoint coordinates?

Thanks

karfly commented 4 years ago

Hi, @nisace! Thank you for thoughtful question.

You're completely right, except that X=[u, v, w] (3-dim vector). Actually it's a 4-dim vector X=[x, y, z, w] (or 3D homogenous vector).

You can refer to this function for more details about algebraic triangulation.

nisace commented 4 years ago

Thanks @karfly for your answer.

I agree that X=[x, y, z, w] is the solution of AX=0 and that we can get the 3D point coordinates as [x/w, y/w, z/w].

My question is about the 2D keypoint coordinates x (or the points argument of the function). These coordinates are of dimension 3 [u, v, w]. However, the function takes points of shape (n ,2) because it assumes that xcan be built from the image coordinates u and v and assumes that w=1.

It's the w=1 that I don't understand. Where does this assumption come from?

Thanks again for your quick response.

karfly commented 4 years ago

The homogeneous scale factor is eliminated by a cross product. (from "Multiple view geometry in computer vision", Richard Hartley and Andrew Zisserman, 12.2, p. 312).

nisace commented 4 years ago

Yes I saw that but the question then is: why would the pixel coordinates of the 2D keypoints on the images correspond to a w value of 1?

I made an experiment where I simulate multiple images by projecting a 3D point with different matrices which gives me a set of "2D" points of dimension 3. Then I give the matrices and the 2D points as input to your function to see if I can reconstructed the 3D point.

If I understand correctly, the first two components of the vector given by the projection are the pixel coordinates. The third component can be ignored (see https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/gluProject.xml). Thus I thought your function would expect the first two components as inputs. However the result is wrong if I do so.

I need to divide the first two components of the 2D points by the third component in order to get the correct result.

nisace commented 4 years ago

@karfly your help would be much appreciated if possible. Thanks in advance !

karfly commented 4 years ago

Let's look at the equation x = PX. Here X is a 3D homogenous X=[x,y,z,1]. Element P[-1][-1] always equals to 1 => the last component of x is always 1 (x=[u,v,w]=[u,v,1]).

If you want to use some w not equal to 1 in the triangulation method above, then you should also change the w in 3D point X.

nisace commented 4 years ago

Why is P[-1][-1] equal to 1? I understand that P is constructed as follows (from http://ksimek.github.io/2013/08/13/intrinsic/) and your code seems to follow this convention.

And even P[-1][-1] = 1 does not imply w = 1, we need the P[-1] = [0, 0, 0, 1] for that.

Or maybe you have some assumptions on the matrix [ R | t ]?

Thanks

T-Rex9000 commented 4 years ago

@karfly Hi! I just read this issue and I admit I don't understand why would P[-1][-1] be 1 and why would the last component of x be 1 in this case without any other assumption. Thanks!