correct 4x4 projection matrix?

iperov commented 1 month ago

currently projection matrix is 3x3

np.array([ [1015.0, 0, 0], 
           [0, 1015.0, 0],
           [112.0, 112.0, 1] ], dtype=np.float32)

and when the point is transformed, Z is discarded

pts = pts [..., :2] / pts [..., 2:3]

which is not sutiable for standard graphics transformations like in opengl

can you provide correct 4x4 projection matrix in order to transform homogenous 3D points (x,y,z,1.0) ?

wang-zidu commented 1 month ago

Thank you for your support, this issue is valuable.

The parameters of the perspective projection camera we use are: focal=1015, znear=5, zfar=15. The camera is located at (0, 0, 10) and faces the negative direction of the z-axis. The rendered image size is 224×224.

I completely agree with what you said about the 4x4 projection matrix being fundamental to some rendering calculations. I believe you can find the process you mentioned in /util/nv_diffrast.py, which is the calculation method for transforming homogenous 3D points (x, y, z, 1).

However, in model/recon.py, the purpose of self.persc_proj is merely to obtain the x and y coordinates without involving the rendering process, so z is not needed (otherwise the calculation would be redundant). You can verify that the v2d obtained using self.persc_proj are consistent with the first two dimensions of screen coordinates obtained by transforming the vertex_ndc using /util/nv_diffrast.py. In short, self.persc_proj is just a way to slightly reduce unnecessary calculations, a similar approach is common in HRN, Deep3D, etc. Hope this helps.

iperov commented 4 weeks ago

I tried various 4x4 matrices from these values (focal=1015, znear=5, zfar=15), but none of them match the same result as your code.

reduce unnecessary calculations

for the processor it's nanoseconds, for the programmer and those who will use the repo it's a headache.

wang-zidu commented 4 weeks ago

I tried various 4x4 matrices from these values (focal=1015, znear=5, zfar=15), but none of them match the same result as your code.

reduce unnecessary calculations

for the processor it's nanoseconds, for the programmer and those who will use the repo it's a headache.

I believe you can directly refer to /util/nv_diffrast.py to get the result you want. The camera parameters have also been verified by using PyTorch3D. Let me explain further:

Starting from line 442 of model/recon.py, we assume the homogenous 3D point v = (x,y,z,1) is one of the points in v3d. First, we calculate the perspective projection matrix:

$$ {\rm{Projection - Matrix}} = \begin{bmatrix} {\frac{1}{{\tan (fov/2)}}} & 0 & 0 & 0 \ 0 & {\frac{1}{{\tan (fov/2)}}} & 0 & 0 \ 0 & 0 & {\frac{{znear + zfar}}{{znear - zfar}}} & {\frac{{2 \cdot znear \cdot zfar}}{{znear - zfar}}} \ 0 & 0 & -1 & 0 \end{bmatrix} = \begin{bmatrix} {\frac{{1015}}{{112}}} & 0 & 0 & 0 \ 0 & {\frac{{1015}}{{112}}} & 0 & 0 \ 0 & 0 & -2 & -15 \ 0 & 0 & -1 & 0 \end{bmatrix} $$

Invert the z direction of v to correspond to the camera's coordinate system. (In /util/nv_diffrast.py, you might also find that the y direction of v is inverted as well, which is done to adapt to the renderer.) So we get (x, y, -z, 1), and then perform the perspective projection to obtain:

$$ v' = \begin{bmatrix} {\frac{{1015}}{{112}}} & 0 & 0 & 0 \ 0 & {\frac{{1015}}{{112}}} & 0 & 0 \ 0 & 0 & -2 & -15 \ 0 & 0 & -1 & 0 \end{bmatrix} \cdot \begin{bmatrix} x\ y\ - z\ 1 \end{bmatrix}=\begin{bmatrix} {\frac{{1015}}{{112}}}x\ {\frac{{1015}}{{112}}}y\ 2z -15\ z \end{bmatrix} $$

Homogenize the coordinates:

$$v'' =\begin{bmatrix} {\frac{{1015x}}{{112z}}}\ {\frac{{1015y}}{{112z}}}\ {2 - \frac{{15}}{z}}\ 1 \end{bmatrix} $$

Finally, the coordinate in ndc space is converted to the image plane:

$${v_{image}} =\begin{bmatrix} {\frac{{v'{'_x} + 1}}{2} \cdot 224}\ {\frac{{v'{'_y} + 1}}{2} \cdot 224} \end{bmatrix} =\begin{bmatrix} {1015\frac{{x}}{z}} + 112\ {1015\frac{{y}}{z}} + 112 \end{bmatrix}$$

You will find that this result is consistent with the result obtained using self.persc_proj and homogenizing in model/recon.py.

I believe the above process is detailed enough and hope it helps you. The cause of your incorrect results could be due to some axis inversions (such as the common y-flip in images). This is likely because of different coordinate system definitions used in different rendering methods. Usually, you just need to visualize the results and check the steps where inversion is needed.

ElliotQi commented 6 days ago

@wang-zidu Hi, thanks for your excellent work. I wonder if this work supports CPU inference. I'm trying to use cpu device but get an error with nvdiffrast (RasterizeCudaContext could not use cpu device)

wang-zidu commented 6 days ago

@wang-zidu Hi, thanks for your excellent work. I wonder if this work supports CPU inference. I'm trying to use cpu device but get an error with nvdiffrast (RasterizeCudaContext could not use cpu device)

Thank you for your feedback. nvdiffrast can be replaced with a simpler renderer such as face3d. You can try replacing it, or if you only need the mesh results, you can simply remove the corresponding nvdiffrast content. If I have time, I will address this request as soon as possible and let you know.

wang-zidu / 3DDFA-V3

correct 4x4 projection matrix? #5