rust-cv / cv

Rust CV mono-repo. Contains pure-Rust dependencies which attempt to encapsulate the capability of OpenCV, OpenMVG, and vSLAM frameworks in a cohesive set of APIs.
831 stars 64 forks source link

Akaze's implementation seems to have issues with rotational invariance #63

Closed stephanemagnenat closed 1 year ago

stephanemagnenat commented 1 year ago

I am trying to use Akaze for an application that searches correspondences between a template and an image seen by a camera.

As a start, I have written the common pipeline building on the excellent tutorial of Rust CV. I am using Akaze with threshold 0.001, looking for two nearest neighbors, and using a distance ratio criteria of 0.8 between the first best match and the second one (best_dist < 0.8 * next_best_dist), as suggested in the Akaze OpenCV tutorial. Besides that, I am using the functions matching and symmetric_matching from the tutorial as is.

For comparison, I created a similar pipeline in Python using OpenCV (4.7.0.72). When only scaling is involved, both Rust CV's Akaze and Open CV Akaze perform well:

Open CV, before matching: left: 2514 keypoints, right: 820 keypoints: correspondences_opencv_scale

Rust CV, before matching: left: 5353 keypoints, right: 1480: correspondences_rustcv_scale

However, when I rotate the small image, Rust CV completely fails while Open CV works like a charm:

Open CV, before matching: left: 2514 keypoints, right: 1207 keypoints: correspondences_opencv_rot

Rust CV, before matching: left 5353 keypoints, right: 2490 correspondences_rustcv_rot

So, it looks to me that there is something very broken with the rotational invariance of RustCV's Akaze implementation. Was this implementation ever validated against the OpenCV one?

I am also surprised by the different numbers of matches, as I would expect more similar ones between the two implementations.

stephanemagnenat commented 1 year ago

I believe I have found some fishy code: angs[idx] = res_y[idx].atan2(res_y[idx]). I believe it should something like res_y[idx].atan2(res_x[idx]) instead, especially when compared with the other versions. However, this correction is not enough to solve the problem.

xd009642 commented 1 year ago

So the implementation was based off an existing Rust implementation and I think another independent implementation instead of OpenCV. I'm not sure if it was compared to OpenCVs implementation - but for more complicated algorithms I personally avoid that sometimes as OpenCV applies a lot of heuristics that may not be in the paper (from experience the OpenCV HOG adds in image pyramids and other things other implementations and the paper don't use).

The fishy line you spotted does seem to be a problem so a PR for that would be good. Does it improve the output at all?

I'm also not super familiar with this area - only mildly familiar as a casual user - so am likely not best to comment but I can try to help out. One question would be how does the matcher impact this?

I assume the visualisation is just showing the points that match between both and not the points where no match was found. Is there some worth in showing all the points identified in the rotated and non-rotated versions including unmatched ones to see if something's up. Although there are a lot of points so not sure if that's useful at all :thinking:

stephanemagnenat commented 1 year ago

Yes showing all points is likely to be a lot. I think that showing only matching lines, as now, is enough so far. The matcher is the brute force one and the selection criteria is the same as in the OpenCV example code I'm using as a comparison.

I found other bugs in the translation from the original C++ implementation the previous Rust implementation is based on. You can follow my progress in this PR. I'm comparing the Rust code with that original implementation.

The one line correction above doesn't improve the performance so far. But correcting two more bugs (the missing reset of sum_x and sum_y to 0 and the too early increase of ang1) does improve a tiny bit the performance. We are still far from OpenCV though.

stephanemagnenat commented 1 year ago

Additional note, akaze_match from the original C++ implementation also works well with my test above, so it is likely the Rust translation that has mistakes.

stephanemagnenat commented 1 year ago

After comparing the evolutions of the same image, I can confirm that they are very different, so probably there is a bug already at that level in the Rust version.

Here are the levels 0 and 1, on the first image above: evolutions_comparison.zip

stephanemagnenat commented 1 year ago

Already rust_evolution_0_Lt.exr looks suspicious. It seems that the Gaussian filter didn't work properly.

stephanemagnenat commented 1 year ago

I found that the gaussian_blur function gives incorrect results! Switching it for imageproc's gaussian_blur_f32 corrects the file rust_evolution_0_Lt.exr and make it look like the reference implementation.

xd009642 commented 1 year ago

I guess it's an implementation detail of the gaussian blur regarding padding and the imageproc one applies some form of padding whereas cv doesn't

stephanemagnenat commented 1 year ago

No, it is worst than that, it happens in the center of the image, not in the borders. The Gaussian blur is seriously degraded. I have the impression that there is an incorrect optimization somewhere. But there are so many other bugs that at the moment I'll live with gaussian_blur_f32 that gives the same result as OpenCV now that I fixed it.

Fyi my approach to correct Rust CV's Akaze implementation is to dump the intermediate generated images (e.g. for the different maps of the evolutions) and compare with the ones from the original C++ implementation, that I have instrumented to generate the corresponding images at the same positions. That shows that, for example, the Scharr filters are also incorrect, leading to an error of a factor 40 in the estimation of the initial contrast for example.

xd009642 commented 1 year ago

Okay, I know this is used as a key part of the VSLAM toolbox for reconstructing 3D point clouds from things like KITTI. Once the PR is ready it would be good to rerun that and make sure it still works as expected (oh god I might have to get vulkan working on my linux machine). I may also prod @vadixidav to take a look at it as he's more familiar with the code/domain than I am.

I'm aware that code like this can be tricky to test but it would be good to see some tests to catch certain issues discovered and try and prevent any regressions. I'll try and allocate some time later this week or next week to look into the gaussian blur some more

stephanemagnenat commented 1 year ago

Ok, good! I fully agree that it is worth checking that there is no regression.

Also, there is quite some re-implementation of features present in imageproc (e.g. Gaussian filter, Schaar filters, etc.). Personally I would prefer using the versions there, or contributing better implementations there if needed (but maybe the current implementations there might be faster now).

Also, in general, I would try to have similar functions performing similarly as their OpenCV counterparts. Of course, simplifications and differing details such as border management might be acceptable.

And I'm also all in for having tests ;-)

xd009642 commented 1 year ago

When this crate was started imageproc/image didn't support having floats as a pixel type and iirc there had been an open issue for quite a while and it needed a big refactor to be possible. It may be that the reimplementations are no longer necessary

stephanemagnenat commented 1 year ago

The diffusion function nonlinear_diffusion::calculate_step is not consistent with the one of the C++ reference implementation: somehow it seems that it doesn't diffuse enough. The C++ code is very hard to read here, so relating the two is not trivial. It looks like the Rust code attempts to decompose the equation into two passes compared to the C++ code, which seems to make sense if the function is separable, but the result differ so something is lost.

Another possibility is that the Rust code was based on an earlier version of the reference implementation.

In any case, adding a missing 0.5 and changing a multiplication to an addition solves that problem! Now the non-linear space is created correctly compared to the reference implementation!

stephanemagnenat commented 1 year ago

@xd009642 one question: to debug I have instrumented the source codes of both the C++ and the Rust versions to dump quite some EXR images of the internal space on the way. Do we want to keep that in the future? It is quite intrusive, but if we want to ensure no regressions, we might want to keep that and compare with reference images. Is there an established best practice for such things in Rust CV?

xd009642 commented 1 year ago

There's not currently an established best practice, and I know one of the goals was working on things like embedded linux boards which makes the intrusiveness even more painful. I think for the current PRs don't submit it.

My thoughts are that we may want to add a ability to compile with something like RUSTFLAGS="--cfg cv_debug" and then we could feature gate it like #[cfg(cv_debug)] in the code so it's not compiled into the library unless the user explicitly wants it. However, before we make such a decision would need to discuss with other maintainers/users some may have better ideas or issues with that approach.

stephanemagnenat commented 1 year ago

We could also gate some introspection features behind #[cfg(test)]. For example for Akaze, we could have some sort of AkazeInspector trait that provides callbacks to be called at different points. They will not be available in release. What do you think?

xd009642 commented 1 year ago

hmm - my initial thought is that such introspection is also useful for engineers working on computer vision applications so I'd rather it was configurable by users as well. But I may be biased as someone who's had to patch opencv with that kind of debug dumping before.

Either way it's a question for a future PR imo

stephanemagnenat commented 1 year ago

Sure. I have them all around, but I'll refactor that into two commits before submitting the PR for inclusion.

mpizenberg commented 1 year ago

Hi, I'm following rerun.io and I think they are building things in the right direction for inspection of CV pipelines. I haven't had the time to test it past the tutorial yet though. But I think it could help in your debugging. They have a logging philosophy, so you log stuff (images, point clouds, ...) tag it, and it's sent to another program (rerun) used for the visualizations. If that's useful, we could maybe add a rerun gated feature, useful for debugging.

vadixidav commented 1 year ago

That sounds like a good solution. Personally I also think exposing enough APIs to dig into all the intermediaries so that we can dump debug data from an external tool would be acceptable as well. For instance, running a step of diffusion manually should be possible using a lower level API. We can add this at a later time. I would be okay with having a cfg for enabling debug information for the time being.

vadixidav commented 1 year ago

So, it has been a while, but I recalled something in particular that I was working on to make the algorithm behave more like the original, but it had some issues. I believe I reverted it, and now it uses nearest neighbor algorithm, see here: https://github.com/rust-cv/akaze/commit/c8de15507e8d59485bea47d40b25fc796ceb3253. Here is the original commit that adds the potentially correct behavior, though it appeared to reduce the quality of the output, but that may have been because we don't have tests that include orientation invariance: https://github.com/rust-cv/akaze/commit/3252ad52adafcdfbc97e726d7ae12bc69708f265. That is the original commit on the archived repository. See here for the current: https://github.com/rust-cv/cv/blob/main/akaze/src/image.rs#L118. Please give those changes a try that I had written (and then promptly reverted) and see if they improve the compatibility.

vadixidav commented 1 year ago

I incorporated the change I recommended above along with @stephanemagnenat 's changes in #65.

stephanemagnenat commented 1 year ago

Thanks @vadixidav, I also had the same function for resizing (albeit implemented differently) in my not-yet-updated changes branch on my local computer (I wanted to clean-up the verbose debugging code before updating the PR branch, but maybe I should keep it updated for information, so you can freely extract pieces from it). I have other changes such as correcting the diffusion step. I will test your changes, update my branch and report here.

Regarding the check you added in mldb_fill_values, I didn't add it because the original code doesn't do an out of bounds access where the Rust code does (I added a check in the original code to be sure), so I suspect that there are bugs elsewhere that lead to this erroneous access and that once solved, this check can be removed.

stephanemagnenat commented 1 year ago

One thing regarding debugging, I believe that to ensure non-regression of this algorithm, ideally we should have points to dump intermediate images of the internal space and compared to them of the original algorithm. That is how I managed to debug so far. I do not know if rerun can be used for that.

stephanemagnenat commented 1 year ago

My further changes leading to correct internal images after create_nonlinear_scale_space are now in #66.

There are still bugs ahead, I suspect that at least the Scharr filters implementation is broken. I continue my debugging and will update #66 with further corrections, or directly #65 if we want me to push directly there.

stephanemagnenat commented 1 year ago

The general Scharr filters were inverted! I fixed that, and simplified the code using imageproc's separable_filter function. Again, I didn't deleted the unused functions, but I think that we should, do you agree @xd009642 and @vadixidav?

xd009642 commented 1 year ago

I'm personally fine with deleting unused functions if they're not public. If they're public takes a bit more of a think because of semver

stephanemagnenat commented 1 year ago

They are not public! Ok, I'll fold their deletion in the respective commits, it is done for the Scharr filters.

@xd009642 for the Gaussian, they are public in the module but not re-exported at the level of the library. Clippy says they are unused. I guess it is also ok to delete them without that being a major change as they are not in the public API, right?

stephanemagnenat commented 1 year ago

Ok, with the commits in #66 so far, the internal spaces are similar up to an epsilon! And the performance with rotation is much better than before, but still not on-par with the C++ version, more bugs to hunt then:

correspondences

stephanemagnenat commented 1 year ago

The C++ implementation uses 8 comparison points for find_scale_space_extrema while the Rust one uses 5, with in addition a bug in a bound. Also, some tests are missing. It could be that the Rust rewrite predates late corrections in the C++ one.

This is now fixed and the Rust implementation finds the same number of keypoints as the C++ one within 1%.

stephanemagnenat commented 1 year ago

Both implementations new find keypoints at similar positions. However matching is still not on par. Next step is to compare keypoints themselves.

vadixidav commented 1 year ago

Please delete unused functions, git will remember them if needed. Don't worry about them.

vadixidav commented 1 year ago

This is all very great investigation. Considering how central akaze is to everything we are doing, this is really excellent for the project as a whole. Thanks so much again for your investigation. I will look once more over the PR before we merge.

stephanemagnenat commented 1 year ago

Thank you for your kind words! It's really great to have Akaze in Rust, and I hope that we can make it reliable and performant (see #67), and therefore promote the use of Rust in CV, to everyone's benefit!

stephanemagnenat commented 1 year ago

So, there was a nasty bug in computing angles: cv::fastAtan2 returns a value in [0:360], while f32::atan2 returns one in [-π:π]. The degree to angle ratio was corrected but not the range. So half of the points had wrong orientation! With this bug fixed, quality is similar to the C++ implementation:

correspondences

@vadixidav also at least on my test image, there are no image access out of bound using the original code, which would make ce8f7cef88d946d6487d886736b12366f9630626 is not necessary. The current Rust version adds 0.5 to variables l and k before sampling the descriptor in mldb_fill_values, which leads to an out-of-bound access. I reached the conclusion that these 0.5 are incorrect in this place, as l and k are used to sample a little oriented window. So I removed them (01f6e625beff008d481b573aae934a155a0a4938).

However, I think that the idea of adding an offset is not wrong. I'm not fully sure but I believe that one should add 0.5 to xf and yf to make the sampling of the descriptor unbiased. This addition doesn't lead to any out of bound access both with my test image and with the unit test. I thus have added it to the PR (786d88af1d2d45118d0f50251deda367b714ca10).

So, two questions:

vadixidav commented 1 year ago

Thank you so much for the finding. I have a very large dataset locally that I can check to see if there are any out of bounds accesses. I did get one before when I tested, which prompted me to add the check. Maybe there is a more elegant way to solve this problem. I imagine that forcing keypoints to be 0.5 further from the edges might be sufficient. I will perform another test and see.

stephanemagnenat commented 1 year ago

Great! I also wanted to do that, but got lost in exploring some optimization...

stephanemagnenat commented 1 year ago

So, using #69 I have been running this version of Akaze on the 2017 COCO validation dataset (5k images), converted to grayscale. The bound check is necessary if we have the +0.5 on xf and yf. Without the 0.5 the check is not necessary.

I convinced myself that the original code is correct and removed the +0.5. I also looked at OpenCV source code but their implementation differs in the sense that the evolution data layers and the window used are different.

I believe that the check is not necessary any more, but I think that it is also fine to keep it. The branch predictor should make it essentially free.

stephanemagnenat commented 1 year ago

I think that PR #69 is ready to be merged. @vadixidav I let you choose whether you wish to keep or remove the boundary check you introduced in ce8f7cef88d946d6487d886736b12366f9630626.

vadixidav commented 1 year ago

If you have confirmed the bounds check is no longer necessary then I will just run a test against my large dataset just in case and if it works fine then I will remove the check.

stephanemagnenat commented 1 year ago

Perfect! Can you do so and merge?

stephanemagnenat commented 1 year ago

@vadixidav, @xd009642 can we merge this please? It is in the way of more changes.

@vadixidav if you do not have time to run this PR on your large dataset, may I suggest to keep the check for now, we can always remove it later once you have run it.

xd009642 commented 1 year ago

So I'm personally happy to merge this with the bounds check without the run on the large dataset - but vadixidav is the boss so I'll defer to his decision.

xd009642 commented 1 year ago

@stephanemagnenat Checked with vadixidav on the rust-cv discord and we're happy to merge so I've just merged it in :tada: