kxhit / EscherNet

[CVPR2024 Oral] EscherNet: A Generative Model for Scalable View Synthesis
https://kxhit.github.io/EscherNet
Other
298 stars 16 forks source link

Camera pose in NeRF dataset #21

Open ChenYutongTHU opened 1 week ago

ChenYutongTHU commented 1 week ago

Dear authors,

Thanks for this great work! I have a question regarding infererence on NeRF-Synthetic dataset.

To transform NeRF-synthetic's camera pose to the 6DoF, I see that the pose is converted from blender convention to colmap convention.

https://github.com/kxhit/EscherNet/blob/10b650492ba97b5104a3136de07d1a67f4ada458/eval_eschernet.py#L451-L456

However, transforming the C2W matrix from blender convention to colmap convention would be

pose[:,1:3] *= -1 # right-multiple the C2W matrix with [[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,1]]

This is also implemented in nerfstudio.

In another place in your code, you apply the left-multiplication [flipping the y-z rows) to the W2C matrix. [I think pose_out contains W2C] This makes sense to me but is inconsistent with how NeRF is handled in this code.

https://github.com/kxhit/EscherNet/blob/10b650492ba97b5104a3136de07d1a67f4ada458/eval_eschernet.py#L491-L494

Can you help look into this issue? Thanks!

ChenYutongTHU commented 1 week ago

I found that both pose_in and pose_out should be in openGL/Blender convention, not sure why NeRF-synthetic is 'converted' to 'opencv'

kxhit commented 1 week ago

Hi, thanks for your interest in our work.

There are indeed many different 3D coordinates used in the evaluation scripts to handle different datasets. Is the results or visualisation not good? If so, I could take a deeper look. Thanks!