naver / dust3r

DUSt3R: Geometric 3D Vision Made Easy
https://dust3r.europe.naverlabs.com/
Other
4.65k stars 515 forks source link

Typos in paper? #71

Closed chrisoffner closed 2 months ago

chrisoffner commented 3 months ago

In section 3.1. under Discussion it says

Using a generic architecture allows to leverage strong pretraining technique, ultimately surpassing what existing task-specific architectures can achieve.

Should this be "techniques" or "a strong pretraining technique"?


In section 3.3. under Recovering intrinsics the paper states

hence only the focal $f_1^∗$ remains to be estimated.

Should this should say "focal length $f_1^*$"?


Moreover, equation (1) states

$$X^{n, m} = P_m P_n^{-1} h (X^n)$$ with $P_m, P_n \in \mathbb{R}^{3 \times 4}$ the world-to-camera poses for images $n$ and $m$ ...

Maybe this is me just nitpicking, but for the matrix inverse $P_n^{-1}$ to exist, $P_n$ would need to be square.

Am I correct in assuming that $P_n^{-1}$ is the top left $3 \times 4$ submatrix of the inverse of the $4 \times 4$ matrix that stacks $P_n$ on top of the row vector $[0, 0, 0, 1]$?

vincent-leroy commented 2 months ago

Thanks for picking up the typos! Regarding the last point, yes the world2cam poses are usually 3x4 matrices, and you can convert them to homogeneous before inversion. You could also manually invert the rotation and translation parts like: $$( P = [R | t] ) \rightarrow ( P^{-1} = [R^T | -R^T t] )$$

This practice seems standard enough to keep the text as is.