image-rs / imageproc

Image processing operations
MIT License
736 stars 145 forks source link

Implement BRIEF descriptor #490

Closed bmgxyz closed 6 months ago

bmgxyz commented 2 years ago

Adds a new module binary_descriptors and an implementation of BRIEF (Calonder, et. al., 2010). Also adds an example (brief) that demonstrates descriptor computation and matching between two images.

I've marked this as a draft because my first-pass implementation is slow. On my machine each descriptor takes about 45 µs to compute, and matching takes about 350 µs per pair. In the original paper these values are closer to 17 µs and 9 µs, respectively. There are plenty of relatively easy improvements to be made on my naive implementation—such as matching with multi-index hashing and patch smoothing with integral images—so I think a more performant implementation is achievable.

Fixes #108

bmgxyz commented 2 years ago

Having done some profiling of the example code—lightly modified to repeat critical operations many times—it looks like a good first target for performance improvement is the Gaussian blur that gets applied during patch description. The original paper points to this as a slow operation and suggests that integral images may be a good tool for approximate smoothing. It looks like this crate already provides some useful functions in this area, so I'll look into writing something that builds on those.

I was surprised that my profiling didn't show much time spent on matching, but I'm suspicious that the low number of matched pairs for my test images makes the matching algorithm appear fast in the profile output. I think implementing MIH is going to turn out to be at least as important as integral smoothing.

theotherphil commented 2 years ago

Sounds great! https://github.com/image-rs/imageproc/issues/93 might be relevant here.

bmgxyz commented 2 years ago

I didn't realize that recursively repeating any positive filter eventually results in a Gaussian. That's a neat result which is apparently due to the Central Limit Theorem. We already have filter::box_filter, which seems to use efficient one-dimensional filters internally, so I've replaced the true Gaussian with three recursive box filters for patch smoothing. That should get us within 3% of a true Gaussian, which I think is fair in exchange for linear time complexity with respect to image size and constant time complexity with respect to kernel size.

Anecdotally, I've noticed a slight increase in false matches. It's unclear to me whether or not that's really the case. If it is, I'm not sure how it could happen with such a close approximation.

This change reduced descriptor computation time from about 45 µs to about 23 µs on my machine. That's close to the 17 µs I'm aiming for to match the original BRIEF paper. According to perf, the only obvious performance improvement would be to somehow avoid calling .to_image() on the patch SubImage before sending it to the box filters. It looks like that method iterates over the pixels and puts them in a new ImageBuffer, which isn't too efficient. Is there a simple way to accomplish this?

Next is probably MIH. Also, my original measurement for matching speed was probably wrong. I've corrected the measurement code and intend to push it later. The true value seems to be around 20 µs on my machine at the moment.

bmgxyz commented 2 years ago

As for modifying filter:gaussian_blur_f32, I gave it a shot but quickly realized that box_filter expects a single-channel image, but the Gaussian blur function takes a general Image. Changing that here and now didn't seem too fun.

bmgxyz commented 2 years ago

I switched the underlying type for BinaryDescriptor from Vec<bool> to Vec<u128>. This improved performance from about 85 ns per candidate to 15 ns. I guess finding all those bools on the heap was slow. Notably, this restricts descriptor length to a multiple of 128 bits, which I think is fine because nothing I've read seems to use anything else.

This linear scan approach is linear in the number of candidates, which itself is the product of the number of descriptors in each input image. For 100 descriptors in each image, that's a total runtime of only about 150 µs, but for 1000 it's 15 ms, and it gets bad quickly after that. Multi-index hashing should help. Currently reading Norouzi et. al. (2014) and hoping that the claimed speed improvements apply here. If I'm reading it right, at the 1000-descriptor level the speedup could be almost two orders of magnitude and sub-linear.

bmgxyz commented 2 years ago

I think this is getting close to done. Each descriptor computation takes about 10.9 µs on my machine, and the matching with LSH is down to about 1.4 µs per descriptor. The first value is pretty stable, but the second one is a bit more variable because it depends somewhat on the difference between the numbers of descriptors in each image, as well as their total. Still, these are both pretty performant, and the system seems to work on my test images—I'm seeing lots of parallel lines, and the keypoints seem to match in nearly all cases. I'm not aware of a good quality metric for this sort of thing, so it's hard to quantify success here.

Two questions for you, @theotherphil, or whoever else has an opinion:

  1. I ended up using locality-sensitive hashing for descriptor matching. I eyeballed the number of tables (l = 3) and set the hash code length (k) to the log of the number of descriptors. These values seem to work, but they're just guesses. Do you have any insight / preference for these values? Note that matching performance and quality depend heavily on these choices.
  2. The descriptor computation spends around 20% of its time converting the patches from SubImage to Image so they can get passed to integral_image(). Do you know of an easy way to avoid this conversion? The method (.to_image()) seems to just go over each pixel and copy it into a new Image, which seems unnecessary.

In any case, I think I'll add some tests and then mark this as ready for review.

bmgxyz commented 2 years ago

Benchmark output on my machine:

$ cargo bench --package imageproc --lib -- binary_descriptors::tests --nocapture
...
test binary_descriptors::tests::bench_brief_fixed_test_pairs_1000_keypoints    ... bench:   7,990,448 ns/iter (+/- 749,063)
test binary_descriptors::tests::bench_brief_random_test_pairs_1000_keypoints   ... bench:   7,905,290 ns/iter (+/- 630,646)
test binary_descriptors::tests::bench_matcher_1000_keypoints_each              ... bench:   2,622,534 ns/iter (+/- 342,305)

Divide each figure by 1000 to estimate the average runtime per descriptor or matched pair.

bmgxyz commented 1 year ago

Backporting performance and API improvements from my ORB implementation, which depends on these changes.

New benchmark output:

$ cargo bench --package imageproc --lib -- binary_descriptors --include-ignored
...
test binary_descriptors::brief::tests::bench_brief_fixed_test_pairs_1000_keypoints  ... bench:   4,519,587 ns/iter (+/- 609,421)
test binary_descriptors::brief::tests::bench_brief_random_test_pairs_1000_keypoints ... bench:   4,536,939 ns/iter (+/- 676,464)
test binary_descriptors::tests::bench_matcher_1000_keypoints_each                   ... bench:   2,605,814 ns/iter (+/- 1,529,375)
theotherphil commented 1 year ago

Sorry, I’ve had very little time over the last year. Thanks for the PR. Is this ready for review now?

bmgxyz commented 1 year ago

Not a problem. When I implemented ORB on that other branch, I ended up making a lot of changes that wouldn't have been included if this had been merged sooner. I think the API is much better now, and of course the whole thing is faster as well.

Anyway, yes, this is ready for review. If/when this merges, I'll probably post my ORB implementation shortly after.

kalwalt commented 1 year ago

Hi, this is an awesome feature! Any news about when it will be merged?

theotherphil commented 1 year ago

This PR spent a long time blocked on me reviewing it, but it’s now good to merge after a handful of review comments have been addressed.

bmgxyz commented 1 year ago

@theotherphil I've been thinking about whether or not feature descriptors really belong in imageproc. Maybe it would be better to put them in rust-cv/cv?

If this gets merged (along with the ORB implementation that depends on it), then rust-cv/cv#15 could be satisfied by passing through to the implementation in this crate. cv already depends on imageproc, so it probably wouldn't be that hard. But I think that introduces confusion on the boundary between imageproc and cv. Since cv is specifically about computer vision, shouldn't it own BRIEF and ORB?

Ultimately I think this comes down to where you want to set the boundary around imageproc. In my view, imageproc should contain features that you might expect to find in GIMP, not OpenCV. On the other hand, the goal in the README at the moment is:

A performant, well-tested, well-documented library with a consistent API, suitable for use as the basis of computer vision applications or graphics editors.

which makes including BRIEF and ORB seem more reasonable. rust-cv/cv is also quite large and complicated, and maybe some crate users would prefer the focus and simplicity of imageproc. (I do, for example.)

What do you think?

theotherphil commented 1 year ago

My original purpose for starting this crate was to use it for computer vision, but I never ended up actually doing any!

Features are definitely in scope for this crate so this would make a good addition, but I’m also totally happy if you decide it makes more sense for this to live somewhere else.

kalwalt commented 1 year ago

I think @vadixidav may say his opinion if this feature may be hosted here or inside rust-cv.

vadixidav commented 1 year ago

I think @vadixidav may say his opinion if this feature may be hosted here or inside rust-cv.

I haven't had as much time to maintain things recently in Rust CV, so it might make more sense for them to live here. If we create traits for feature detectors and descriptors we can just wrap the ones in imageproc. The cv crates haven't had updates in some time to crates.io.

I am not opposed to it though if you would rather it go there.

kalwalt commented 1 year ago

Any news on this side?

bmgxyz commented 9 months ago

Ready for another pass @theotherphil.

theotherphil commented 6 months ago

Thanks a lot! I’ll release a new version of imageproc this week including this feature.

theotherphil commented 6 months ago

Doh, looks like we need to fix the build first!

kalwalt commented 6 months ago

Thanks a lot! I’ll release a new version of imageproc this week including this feature.

That's great! I will test as soon it will be released! :smile: