rust-cv / space

Spatial library for Rust
MIT License
39 stars 4 forks source link

Add 2nd type param to Metric trait #36

Closed YuhanLiin closed 3 years ago

YuhanLiin commented 3 years ago

This change makes the Metric trait more flexible by following a similar design as the Add trait in std. The 2nd type param defaults to the 1st type param, so back-compat is preserved.

This probably won't help integrate Metric with Linfa's Distance trait, since the latter requires method-level generics. That integration probably requires GATs.

vadixidav commented 3 years ago

I would rather keep the trait as simple as possible. Is there any situation in which a metric space would be defined over two different kinds of points? If we don't know of such a situation at this time, I think we should just keep it at one parameter.

The change to the doc is good tho, and we should make that change.

YuhanLiin commented 3 years ago

It's mainly for computing distances between two ArrayViews, since they can have different lifetimes. This means that they're separate types, so two type params are needed. If there's only one type param then the two ArrayViews would need to have the same lifetime, which is an unnecessary restriction for computing distances.

vadixidav commented 3 years ago

It's mainly for computing distances between two ArrayViews, since they can have different lifetimes. This means that they're separate types, so two type params are needed. If there's only one type param then the two ArrayViews would need to have the same lifetime, which is an unnecessary restriction for computing distances.

I think this makes sense, but this will require the other traits to change as well, as they assume that both are the same. We may need to generally restructure these traits to handle ArrayView point usage. I will look over this a bit and see how we might be able to change things here to support that use case, although I am a bit busy with some other work currently. If you can come up with a model that supports both the use case of adding points on-the-fly (which can be compared with the metric) and also in a batch at the beginning (where the batch can be partitioned into several points (like ArrayView), each comparable by the metric), then that would help solve the problem. We might need an additional trait or two altogether to abstract over the batch creation of the underlying data structure.

Alternatively, another way we might be able to structure this is if the distance metric actually looks up the ArrayView to compare internally, but if we do that it wont be able to access the ArrayView of a new point, so it would make sense to have this dual-type bound in that case. I think we need to experiment with what that looks like in practice.

YuhanLiin commented 3 years ago

Adding points to the model, either on the fly or via batching, does not even require an extra type param on Metric. This is because every point added to a model need to have the same lifetime bound, so they are all the "same" type. The main usecase I envisioned for the 2nd type param is KNN queries, where the query point has a separate lifetime from the points in the model. However, after further experimentation, it turns out that KNN queries with different lifetimes require method-level generics, which is not compatible with multiple metric type params. A 2nd type param is still nice to have, but it's not actually useful for models like hnsw and I'll be replacing it with a GAT once that's stabilized.

YuhanLiin commented 3 years ago

By the way, why isn't there a trait for inserting into types that don't implement KnnMap?

vadixidav commented 3 years ago

By the way, why isn't there a trait for inserting into types that don't implement KnnMap?

The reason is that the KnnMap trait adds the associated type Value, which is then used when inserting. There could be a KnnMapInsert and then a separate KnnPointInsert trait, which would allow insertion for data structures which do not support the map functionality, but map functionality is more versatile in general and should be relatively easy to add to any KNN search datastructure.

YuhanLiin commented 3 years ago

Since having a 2nd type param without GATs isn't actually useful for its intended purpose, I'll close this PR.