Closed mooreniemi closed 2 years ago
I'm actually hitting issues even if I convert to &[f32]
which is what I see in the examples, so I think it's not ndarray per se but me misunderstanding how to write the types.
I added an example to hgg
that covers the usage of ndarray
here: https://github.com/rust-cv/hgg/blob/main/examples/ndarray.rs
You should be able to use the same process to create a metric for hnsw
as well.
Keep in mind that cosine similarity is not a metric, and even cosine distance is not a metric (see https://towardsdatascience.com/what-is-metric-74b0bf6e862). You may encounter issues when using nearest neighbor search graph algorithms with distances that do not fulfill the triangle inequality (and thus aren't true metrics). If you truly want to perform a search in angular space, it is recommended to use angular distance: https://en.wikipedia.org/wiki/Cosine_similarity#Angular_distance_and_similarity. You may find that cosine distance works fine if you insert enough points into the data structure, since the relationship between angle and cosine distance become roughly proportional at small angles, but these data structures may have trouble being built out initially before eventually settling out. Additionally, since hnsw
and hgg
are hierarchical, the higher layers may have some broken triangle inequalities which would lead to bad search behavior. I highly recommend switching to using a proper distance metric before proceeding.
Feel free to reach out on Discord if you have any additional questions. I will respond when available.
Let me know if this resolves your issue.
Also, just in case, make sure you are using the same version of space
mentioned in the dependencies: https://crates.io/crates/hnsw/0.11.0/dependencies. For 0.11.0
, that version is space 0.17.0
.
Also, make sure to use to_bits()
rather than as
here:
a.dot(b) as u64
Here is another version using euclidean metric and to_bits
. to_bits
allows the positive non-NaN floats to be compared as integers for higher performance during search. It also ensures full ordering, which is not guaranteed by floats. Using as
may cause a loss of precision and dynamic range, and it might not work for all data sets.
let distance = (a - b).mapv(|v| v*v).sum().sqrt();
debug_assert!(!distance.is_nan());
distance.to_bits()
Thanks for this! I realized last night but didn't comment that I was accidentally on space 18 rather than 17 which is why the trait wouldn't satisfy even for &[f32]
.
After switching, I was able to do what you showed above to insert into hnsw. I appreciate the rest of your advice, as well!
I wanted to implement
distance
as ndarray'sdot
but I'm finding the type errors inscrutable so far. Are there any examples anywhere of using this crate and ndarray together?I implemented:
But when I try to
insert
like so:I hit:
It's not jumping out to me what's missing given I see no bounds on
P
and the bounds of theSelf::Unit
are met.