sdd / kiddo

Kiddo
Apache License 2.0
91 stars 14 forks source link

`nearest_n_within` does not limit the number of items when not sorted #168

Open Nahor opened 5 months ago

Nahor commented 5 months ago

In kiddo v4.2.0, nearest_n_within will only limit the number of items if sorting is enabled. If it's disabled, all the items within the radius are returned.

E.g. in the cities example, replace nearest_n with nearest_n_within and a radius of 10km.

-let nearest_5_idx = kdtree.nearest_n::<SquaredEuclidean>(&query, 5);
+let nearest_5_idx = kdtree.nearest_n_within::<SquaredEuclidean>(
+     &query,
+     kilometres_to_unit_sphere_squared_euclidean(10.0),
+    5,
+    false,
+);

This outputs 13 cities (for my data set) instead of the expected 5. If the list is sorted, then it works as expected.


Also note that the example in the documentation for nearest_n_within creates a tree of 2 items, asks for 2 items, and then is happy because it got 2 items in return!

sdd commented 3 months ago

Whoops! Thanks for spotting this. Will fix in an upcoming release as soon as I can.

ezrasingh commented 2 months ago

Do you have any information on the release date for the next version?

I encountered the same bug in my project and was wondering when the fix from #175 will be available.

Nahor commented 2 months ago

@ezrasingh , you can always use the git repo instead of a released crate:

kiddo = { git = "https://github.com/sdd/kiddo.git", rev = "cc62c6d76accdeba75c0f55db80c28eab8a21ae7" }
ezrasingh commented 1 month ago

Hey @Nahor,

I pinned Kiddo to that revision in my crate, but my unit tests are still failing. Initially, I thought it might be due to stale code on my machine. However, the tests also failed in CI.

Could this possibly be a regression?

ezrasingh commented 1 month ago

I just submitted a PR (#180) to validate the consistency of the behavior when max_qty = 0. During testing, I found that the function handles this case differently depending on whether the results are sorted or not. Specifically, the issue arises when sorted = false, where the function does not return an empty vec as expected, unlike when sorted = true.

This difference explains why pinning Kiddo did not work for my case. While I can work around this with a conditional branch, I don't believe this inconsistent behavior is intended. If it works with sorted results, it should also work with unsorted results, ensuring consistent handling of max_qty = 0.

sdd commented 1 month ago

Hey all - just released v4.2.1: https://crates.io/crates/kiddo/4.2.1

ezrasingh commented 4 weeks ago

Hi @sdd, so far v4.2.1 has been great however I notice this release still fails one of the edge case I added in PR #180:

Calling nearest_n_within when max_qty = 0:

sdd commented 4 weeks ago

Aah hey Ezra, I'll take a look over this weekend and see if I can get a fix out.