About camera coordinate system

jeb0813 commented 2 months ago

Hi @tobias-kirschstein , I'm confused with the cam extrinsic and intrinsic. The paper wrote:

We estimate an individual extrinsic and a shared intrinsic camera matrix by employing a fine checkerboard in combination with a bundle adjustment optimization procedure.

I use colmap to estimate extrinsic and intrinsic to reproduce reconstruct on other dataset, but I canot get correct params. So I did an experiment on nersemble using colmap, and I got roughly same numerical value but positive and negative differs a lot. I visualized the estimate results and it looks same with the cam array in the paper, so I suggest that might be a coordinate system problem.

Colmap outputs should be opencv, what's the coordinate system in NeRSemble?

Thank you.

jeb0813 commented 2 months ago

I checked issue_9, but that doesn't explains why colmap output not matching the metadata in nerfsemble. :sob:

tobias-kirschstein commented 2 months ago

Hi, Can you share a visualization of your colmap poses? You can use the dreifus library for quickly visualizing camera poses. It is expected that you won't get the same world space since you are not using a calibration checkerboard with metrical distances. But that shouldn't matter for the method as long as the scale of the world space is roughly metrical.

KaneLindsay commented 2 months ago

I loaded custom data successfully with my own multi-view camera setup with the coordinate system plotted here:

coordinates

X increases to the right Y increases downwards Z increases forward

jeb0813 commented 2 months ago

Hi @tobias-kirschstein and @KaneLindsay , this is colmap output in colmap. colmap_visual

This is colmap visual in dreifus. Really small and sparse cameras. colmap_visual_dreifus

This is meta NeRSemble in dreifus. meta_visual_dreifus

Colmap visualization in colmap looks same with meta NeRSemble in dreifus. But Colmap visualization in dreifus seems weird.

The colmap ouput is OpenCV, so there must be some convertion L missed. What's the possible reason account for this？

Thank you for the reply!

tobias-kirschstein commented 2 months ago

Hi @jeb0813 , thanks for the visualizations. To me, that just looks like COLMAP gives you a different world-space scale. Note the positions of the cameras from NeRSemble are all within 1 unit distance of the origin, while the output of COLMAP seems to be giving you camera positions with 5 or more units distance to the origin. This is somewhat expected since COLMAP cannot know what 1 unit should mean in your scene. You could just multiply the translation part of the cam2world matrices that you get from COLMAP with a small factor (say 0.2, effectively shrinking the world space by 5x) and that should bring the cameras closer to the origin where the NeRSemble cameras are. In terms of 3D reconstructions it does not really matter what scale the world space has. The only point where it becomes relevant is if you have a 3D bounding box where everything outside is assumed to be empty. In the case of NeRSemble, there is a tight 3D bounding box around the reconstructed head to avoid having too many floaters. Therefore, if your world-space has a vastly different scale (as it seems to be the case for your COLMAP poses), the 3D reconstruction will fail since most of the reconstruction would be outside the bounding box. Instead of ensuring your world space has roughly metric scale, you can of course also adjust the size of the bounding box in the train script.

tobias-kirschstein / nersemble

About camera coordinate system #12