nianticlabs / ace

[CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses
https://nianticlabs.github.io/ace
Other
354 stars 34 forks source link

Does ACE not support large dataset? #24

Closed monamourvert closed 1 year ago

monamourvert commented 1 year ago

After reproducing ACE with 7scenes, cambridge, and my own dataset, I found out the translation and the rotational error quite good. But the problem is, when I visualize the result (I set the --render_visualization to True) to my own dataset that quite large rather than cambridge dataset, the result quite strange.

My own dataset ( A big building) image

The test result image

The mapping result of ACE image

Is it safe to say if ACE doesn't support a large dataset? Because the mapping in 7scenes and Cambridge dataset are pretty good and I can see the details. I'm just curious. Hope to hear from you soon!

tcavallari commented 1 year ago

Considering the size of that building, a median error of 6.1deg/12.1 cm seems reasonable using the single network approach (which is the default in the training scripts). We had similar results in the Cambridge landmarks dataset.

You can try using the "ensemble" approach we presented in the paper, which splits the input dataset into multiple disjoint clusters and maps them independently. Localization is performed against all maps, and we choose the pose that has the highest inlir count. A starting point could be to use 4 clusters and see if you can obtain better error metrics than the ones you had previously.

To use the clustering parameters in the various scripts, see here for training (use the --num_clusters and --cluster_idx params) and here for the evaluation scripts and how to merge predictions.

You can refer to the Cambridge landmarks ensemble training script for further details on how to call the various executables: https://github.com/nianticlabs/ace/blob/main/scripts/train_cambridge_ensemble.sh

monamourvert commented 1 year ago

Thanks! I did as you said and it works well. The error decrease but for the visualization... I wonder why it doesn't resemble the geometry shape of dataset? I mean as I mentioned previously, the cambridge dataset's result visualization is pretty good. I can see the detail such as the window, door, etc (Cambridge: ShopFacade) but for my dataset. I barely see anything. That's why I wonder if is there any problem with my dataset or another stuff happened. Any idea about this?

ebrach commented 1 year ago

Did the visualisation improve with the ensemble variant? (The public code has no option to fuse the visualisations, so you get one video per model, ie one video per chunk of the map. But those individual visualisation might be sharper than the one you posted above.)

ACE predicts dense correspondences, one for each 8x8 pixel block of the input.

In summary: Many factors might affect the quality of the visualisation, but a clean visualisation is not required for good pose estimates. As long as the network can predict enough reasonable 3D points, RANSAC can figure it out.

monamourvert commented 1 year ago

Thanks! For now, I'll close this issue