nianticlabs / ace

[CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses
https://nianticlabs.github.io/ace
Other
359 stars 34 forks source link

Inference code #4

Closed Khoa-NT closed 1 year ago

Khoa-NT commented 1 year ago

Hello, I would like to run the model to estimate the camera poses only after I trained the model with my customs dataset. Which means I don't have the GT camera poses. If I want to do that, I have to modify CamLocDataset to disable reading the pose_dir at https://github.com/nianticlabs/ace/blob/main/dataset.py#L123 and modify it to return pose as None in this https://github.com/nianticlabs/ace/blob/main/dataset.py#L496 And also, turn off evaluation from here https://github.com/nianticlabs/ace/blob/main/test_ace.py#L236 Would you mind checking for me?

ebrach commented 1 year ago

Your proposal sounds reasonable. Instead of returning None in the dataset, you could also return an identity matrix, such that you can leave the evaluation code as is.

Your pose estimates will be logged in the output pose file. Naturally, the error estimates in the same file are nonsense and should be ignored.

Khoa-NT commented 1 year ago

Hello @ebrach, thank you for your feedback. I made a new deploy_ace.py based on text_ace.py and I removed all the evaluation codes to obtain only the predicted camera poses. It runs smoothly with my test dataset. However, when I convert the predicted camera poses to colmap format (images.bin), some predicted camera poses are beneath the ground. image

My training images are captured in a small radius and test images are in a wider radius. Should I train the head network longer? Do you have any suggestions?

tcavallari commented 1 year ago

Hello.

That's not too surprising, considering the scale difference between training and testing images, as you mention. This is similar to what we have seen in some of the hardest scans of the Wayspot datasets (like "Winter sign", where the difference is really extreme).

Still, from your screenshot, looks like some of the poses are in the right place, the others look like outliers, which is encouraging.

Something to keep in mind is that the code at the moment outputs a pose for every frame, regardless of how good that pose is in practice: the evaluation metrics for the 7/12 Scenes and Cambridge datasets don't have a concept of inlier/outlier, so to speak.

One thing you could look at is to see how many RANSAC inliers each of the poses predicted for your scene has, and apply a threshold to discard outliers. That should reduce the clutter. You can see the inlier count for each frame is returned by the C++ forward_rgb function, here: https://github.com/nianticlabs/ace/blob/main/test_ace.py#L222, so you can adapt the script to filter the outputs.

ebrach commented 1 year ago

The inlier count is also written to our pose files: https://github.com/nianticlabs/ace/blob/6515d1df5bb0c2c803137baa52a901fe2052089a/test_ace.py#L288-L291

So you can also apply the filter in your conversion script for COLMAP.

As a starting point: In our experience, estimates with inlier counts below 100 are rarely trustworthy. Estimates with inlier counts above 400 are usually pretty good. The optimal inlier threshold for your data is likely different, and you will have to play around to find it.

tcavallari commented 1 year ago

Closing for now. Happy to continue the discussion if more questions come up.