Coordinate systems - Githubissues

jakubcerveny commented 10 months ago

Hi, thanks for making this amazing work available!

I'm trying to convert my own data to the NeRF format so I can process it with train.py, but I'm having some trouble with the coordinate systems.

Do the NeRF import functions follow the conventions described here: https://docs.nerf.studio/en/latest/quickstart/data_conventions.html ?

I.e., is the transform_matrix expected to contain columns with camera X (right), Y (up), Z (back) axes, and the camera position? Inside your code I see it is inverted to become the world-to-camera matrix, but what is the meaning of the transpose and the sign flips in readCamerasFromTransforms:191-193?

I have several cameras that share the same position, i.e., they have the same last column in transform_matrix, but in the SIBR viewer they no longer keep the same position :-/

Also, in what coordinate system is the initial point cloud expected? I suspect internally the COLMAP system is used for the world coordinates (Y down, Z forward), is that correct?

Thank you, Jakub

BennetLeff commented 10 months ago

Some clarity on this would be great!

grgkopanas commented 10 months ago

Hi,

Let me try to clarify as much as I can the camera systems we use:

In the python code and the cuda gaussian rasterizer we use the colmap camera system: Y-down Z-forward. This can be seen here, where we read the colmap files and we do no transformations. While here you see how we trasnform the "nerf-synthetic" camera model to the colmap one.
The sIBR viewer has a different camera model and we transform the cameras from sIBR to COLMAP camera model here: https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/network_gui.py#L75C1-L76C67

Also, in what coordinate system is the initial point cloud expected? I suspect internally the COLMAP system is used for the world coordinates (Y down, Z forward), is that correct?

In this case I am not sure what a "point-cloud coordinate system" means I assume that the above probably clarify what you need.

Feel free to ask more questions in case things dont work out for you, I know that some parts of our code regarding camera systems are very obscure and do a lot of unnecessary operations, it's on my TODO list to clean it up.

BennetLeff commented 10 months ago

Great, this is what I ascertained after digging into it for a long time but want confirmation because my renders with custom data aren't turning out right.

grgkopanas commented 10 months ago

Just a head's up but you probably already realized for legacy reasons we feed the world2camera and the full_transformation matrix transposed to the cuda rasterizer. This has to do with row-major and column-major assumptions between our cuda code and python code.

jakubcerveny commented 10 months ago

Thanks for the clarifications @grgkopanas, it's all now much more clear.

I'm still suspecting the import of NeRF cameras might contain a bug, though. What is strange is the line T = -matrix[:3, 3]. It looks like the resulting camera is actually a reflection of the intended camera, but it works out OK because the Z axis is also reflected. Since the cloud is random in the NeRF tests, it works, but I want to use an actual cloud, which however seems projected backwards in my output (the cameras look "back" and the image is XY flipped).

grgkopanas commented 10 months ago

hmm, the intention here would be to negate Y and Z axes. In line 191 we negate all 3 axeses but then we negate back the axis X .

This could possible have some hidden errors, if you find one and you can actually fix it, feel free to make a pull request and as long as the nerf synthetic datasets still works we will merge it.

-gk

kllgjc commented 10 months ago

Any way you could use the initial transforms.json in nerfstudio format (using iphone's arkit in something like nerfcapture or kiri engine) to initialize camera poses in colmap?

https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses

been trying to make this work but to no avail.

grgkopanas commented 10 months ago

I am not very familiar with the details of all the software that you mentioned so I don't feel very comfortable advising on that.

-gk

On Mon, Aug 21, 2023, 20:54 cgallik @.***> wrote:

Any way you could use the initial transforms.json in nerfstudio format (using iphone's arkit in something like nerfcapture or kiri engine) to initialize camera poses in colmap?

https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses

been trying to make this work but to no avail.

— Reply to this email directly, view it on GitHub https://github.com/graphdeco-inria/gaussian-splatting/issues/100#issuecomment-1687376410, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGXXYJI5QY5XHY3EAVKO4LXWQUOVANCNFSM6AAAAAA3VMNENA . You are receiving this because you were mentioned.Message ID: @.***>

BennetLeff commented 10 months ago

Any way you could use the initial transforms.json in nerfstudio format (using iphone's arkit in something like nerfcapture or kiri engine) to initialize camera poses in colmap?

https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses

been trying to make this work but to no avail.

Have you read the nerfstudio documentation on coordinate systems? It fills in some of the info to get started digging into this. The short answer is that yes, the transforms.json file contains the necessary information but actually accomplishing that has proven non-trivial.

coreqode commented 10 months ago

Just a small doubt. Why are we transposing the rotation matrix here ? Is it because we are converting to Column-major convention?

grgkopanas commented 10 months ago

Just a small doubt. Why are we transposing the rotation matrix here ? Is it because we are converting to Column-major convention?

I have to clean this up when I find time. No this transpose is to cancel out another transpose that never should have been there.

To accommodate column-major convention we transpose here

Bin-ze commented 10 months ago

Any way you could use the initial transforms.json in nerfstudio format (using iphone's arkit in something like nerfcapture or kiri engine) to initialize camera poses in colmap?

https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses

been trying to make this work but to no avail.

I have implemented it, the details can be viewed： https://github.com/Bin-ze/3d_gaussian_magic_change/blob/master/scene/dataset_readers.py

kllgjc commented 10 months ago

Any way you could use the initial transforms.json in nerfstudio format (using iphone's arkit in something like nerfcapture or kiri engine) to initialize camera poses in colmap? https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses been trying to make this work but to no avail.

I have implemented it, the details can be viewed： https://github.com/Bin-ze/3d_gaussian_magic_change/blob/master/scene/dataset_readers.py

First off, thank you! secondly, I only think you are halfway there. You read the data from the transfroms.json file for GS, but i think my idea was to use use it to help initialize colmap's SfM pipeline. The link I provided explains how to do this.

My reasoning for doing this is, if I have the initial camera positions from my iphone using arkit enabled apps. It is my understanding that this could help colmap in the sparse cloud reconstruction, and be refined in the bundle adjustment stage. Then instead of initializing GS from random points and using camera positions from the transforms.json file, you can have the sparse cloud that is in the standard pipeline, and be you'll be a little more certain that colmap will work!

Ideally, I want to set up a sort of "rig" with my iphone and a more professional camera side-by-side.

I also see you are working on bringing in pointcloud from polycam, this would be another interesting way of getting points and camera positions!

Snosixtyboo commented 10 months ago

Should be resolved with acceptance of @jakubcerveny 's pull request

hughkhu commented 5 months ago

Just a small doubt. Why are we transposing the rotation matrix here ? Is it because we are converting to Column-major convention?

I have to clean this up when I find time. No this transpose is to cancel out another transpose that never should have been there.

To accommodate column-major convention we transpose here

I find dataset_readers.py#L197. Is it the reason that we transpose R here ?

graphdeco-inria / gaussian-splatting

Coordinate systems #100