Closed DavidXu-JJ closed 1 year ago
The answer to the confusing Problem 1 is figured out,
https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L78-L79
Here in line 79, camera location is setting to be T
vector:
https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L63
However, the actual camera location is at -T
vector. What matters in this function is the relative position between the pixel location and the camera location, so cameraToWorld
matrix doesn't need to take the -T
as its translation part.
I remain my opinion on Problem 2. But since it's not the crucial part, so I close this issue.
At last, I'm sorry for the annoying 'open' and 'close' of my issue.(I'm not very much familiar with the operation on the issue)
EOF
@DavidXu-JJ Hi! Sorry to bother you. I encountered a similar problem related to DTU dataset's coordinate system convention, and I'm wondering if you know about it.
My dataset follows NeRF's coordinate system convention, that is OpenGL convention (x-axis to the right, y-axis upward, and z-axis backward along the camera’s focal axis).
My issue is, if I apply the dataset to VolSDF directly, the computed ray_dir
is incorrect. I think the problem is in the rotation matrix, DTU/BlendedMVs might follow a different convention. But I couldn't find anything about the coordinate system convention of DTU dataset, do you know about this?
Thank you very much!
@raynehe If I doesn't mess it up, I remember the most dataset follow OpenCV coords. Maybe you can try to simply reverse the y and z axis. I'm sorry if my suggestion doesn't help or is wrong.
Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.
load_K_Rt_from_P
at line 48 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L48-L50 This code really makes me confused and I'm not able to give an explanation to it. I read the following code at line 78 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L73-L78 It seems that you usepose
as acameraToWorld
matrix. I did an experiment in advance, the following code is from stackoverflow:C = np.eye(4) C[:3, :3] = k @ r C[:3, 3] = k @ r @ t
out = cv2.decomposeProjectionMatrix(C[:3, :])
lift
in line 96 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L96-L109 I don't know why thex_lift
takesy
andfy
into consideration. It seems thatsk
should be 0, but I test it in runtime and I get:It seems that
sk
is not 0. So the transformation becomes:$$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\\ 0&f_y&c_y&0\\ 0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$
Here [x,y,z,1] is the point in the camera coordinates. I find that:
$$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$
The actual result of
x_lift
is:$$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$
But in rend_list.py,
x_lift
is like to be:$$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$
So when
z=1
, the code is correct. Would it be better if it is simply changed to be:(
/ z
is added to thex
)The first question means more to me than the second question. Would you please explain the logic of
pose
matrix to me.Hope this issue would help other people as well.
I try my best to express my question as clear as possible. If there's something unclear or wrong with me, please inform of me.