Closed Khoa-NT closed 1 year ago
Your proposal sounds reasonable. Instead of returning None
in the dataset, you could also return an identity matrix, such that you can leave the evaluation code as is.
Your pose estimates will be logged in the output pose file. Naturally, the error estimates in the same file are nonsense and should be ignored.
Hello @ebrach, thank you for your feedback.
I made a new deploy_ace.py
based on text_ace.py
and I removed all the evaluation codes to obtain only the predicted camera poses. It runs smoothly with my test dataset.
However, when I convert the predicted camera poses to colmap format (images.bin), some predicted camera poses are beneath the ground.
My training images are captured in a small radius and test images are in a wider radius. Should I train the head network longer? Do you have any suggestions?
Hello.
That's not too surprising, considering the scale difference between training and testing images, as you mention. This is similar to what we have seen in some of the hardest scans of the Wayspot datasets (like "Winter sign", where the difference is really extreme).
Still, from your screenshot, looks like some of the poses are in the right place, the others look like outliers, which is encouraging.
Something to keep in mind is that the code at the moment outputs a pose for every frame, regardless of how good that pose is in practice: the evaluation metrics for the 7/12 Scenes and Cambridge datasets don't have a concept of inlier/outlier, so to speak.
One thing you could look at is to see how many RANSAC inliers each of the poses predicted for your scene has, and apply a threshold to discard outliers. That should reduce the clutter. You can see the inlier count for each frame is returned by the C++ forward_rgb
function, here: https://github.com/nianticlabs/ace/blob/main/test_ace.py#L222, so you can adapt the script to filter the outputs.
The inlier count is also written to our pose files: https://github.com/nianticlabs/ace/blob/6515d1df5bb0c2c803137baa52a901fe2052089a/test_ace.py#L288-L291
So you can also apply the filter in your conversion script for COLMAP.
As a starting point: In our experience, estimates with inlier counts below 100 are rarely trustworthy. Estimates with inlier counts above 400 are usually pretty good. The optimal inlier threshold for your data is likely different, and you will have to play around to find it.
Closing for now. Happy to continue the discussion if more questions come up.
Hello, I would like to run the model to estimate the camera poses only after I trained the model with my customs dataset. Which means I don't have the GT camera poses. If I want to do that, I have to modify
CamLocDataset
to disable reading thepose_dir
at https://github.com/nianticlabs/ace/blob/main/dataset.py#L123 and modify it to return pose as None in this https://github.com/nianticlabs/ace/blob/main/dataset.py#L496 And also, turn off evaluation from here https://github.com/nianticlabs/ace/blob/main/test_ace.py#L236 Would you mind checking for me?