Does ACE not support large dataset?

monamourvert commented 1 year ago

After reproducing ACE with 7scenes, cambridge, and my own dataset, I found out the translation and the rotational error quite good. But the problem is, when I visualize the result (I set the --render_visualization to True) to my own dataset that quite large rather than cambridge dataset, the result quite strange.

My own dataset ( A big building)

The test result

The mapping result of ACE

Is it safe to say if ACE doesn't support a large dataset? Because the mapping in 7scenes and Cambridge dataset are pretty good and I can see the details. I'm just curious. Hope to hear from you soon!

tcavallari commented 1 year ago

Considering the size of that building, a median error of 6.1deg/12.1 cm seems reasonable using the single network approach (which is the default in the training scripts). We had similar results in the Cambridge landmarks dataset.

You can try using the "ensemble" approach we presented in the paper, which splits the input dataset into multiple disjoint clusters and maps them independently. Localization is performed against all maps, and we choose the pose that has the highest inlir count. A starting point could be to use 4 clusters and see if you can obtain better error metrics than the ones you had previously.

To use the clustering parameters in the various scripts, see here for training (use the --num_clusters and --cluster_idx params) and here for the evaluation scripts and how to merge predictions.

You can refer to the Cambridge landmarks ensemble training script for further details on how to call the various executables: https://github.com/nianticlabs/ace/blob/main/scripts/train_cambridge_ensemble.sh

monamourvert commented 1 year ago

Thanks! I did as you said and it works well. The error decrease but for the visualization... I wonder why it doesn't resemble the geometry shape of dataset? I mean as I mentioned previously, the cambridge dataset's result visualization is pretty good. I can see the detail such as the window, door, etc (Cambridge: ShopFacade) but for my dataset. I barely see anything. That's why I wonder if is there any problem with my dataset or another stuff happened. Any idea about this?

ebrach commented 1 year ago

Did the visualisation improve with the ensemble variant? (The public code has no option to fuse the visualisations, so you get one video per model, ie one video per chunk of the map. But those individual visualisation might be sharper than the one you posted above.)

ACE predicts dense correspondences, one for each 8x8 pixel block of the input.

There are some areas, where it cannot predict sensible correspondences, like sky areas. It will predict arbitrary correspondences that might end up as random noise in your 3D visualisation.
For repeating structure (like the windows in your scene) the network might struggle to tell them apart, and instead predict some kind of average correspondence for all of them. These might also end up as noise in the visualisation.
If a scene is very large, the network might struggle to fit it all within its limited capacity. It will try its best, but the correspondences might get inaccurate which shows as blurriness in the visualisation. Pose estimation might still work reasonably well, since the final pose estimate uses many correspondences and their individual inaccuracies average out. As you use an ensemble with more models, the visualisations should get sharper since more capacity is available to fit the scene.

In summary: Many factors might affect the quality of the visualisation, but a clean visualisation is not required for good pose estimates. As long as the network can predict enough reasonable 3D points, RANSAC can figure it out.

monamourvert commented 1 year ago

Thanks! For now, I'll close this issue

nianticlabs / ace

Does ACE not support large dataset? #24