Closed YuhanLiin closed 3 years ago
I would rather keep the trait as simple as possible. Is there any situation in which a metric space would be defined over two different kinds of points? If we don't know of such a situation at this time, I think we should just keep it at one parameter.
The change to the doc is good tho, and we should make that change.
It's mainly for computing distances between two ArrayView
s, since they can have different lifetimes. This means that they're separate types, so two type params are needed. If there's only one type param then the two ArrayView
s would need to have the same lifetime, which is an unnecessary restriction for computing distances.
It's mainly for computing distances between two
ArrayView
s, since they can have different lifetimes. This means that they're separate types, so two type params are needed. If there's only one type param then the twoArrayView
s would need to have the same lifetime, which is an unnecessary restriction for computing distances.
I think this makes sense, but this will require the other traits to change as well, as they assume that both are the same. We may need to generally restructure these traits to handle ArrayView
point usage. I will look over this a bit and see how we might be able to change things here to support that use case, although I am a bit busy with some other work currently. If you can come up with a model that supports both the use case of adding points on-the-fly (which can be compared with the metric) and also in a batch at the beginning (where the batch can be partitioned into several points (like ArrayView
), each comparable by the metric), then that would help solve the problem. We might need an additional trait or two altogether to abstract over the batch creation of the underlying data structure.
Alternatively, another way we might be able to structure this is if the distance metric actually looks up the ArrayView
to compare internally, but if we do that it wont be able to access the ArrayView
of a new point, so it would make sense to have this dual-type bound in that case. I think we need to experiment with what that looks like in practice.
Adding points to the model, either on the fly or via batching, does not even require an extra type param on Metric
. This is because every point added to a model need to have the same lifetime bound, so they are all the "same" type. The main usecase I envisioned for the 2nd type param is KNN queries, where the query point has a separate lifetime from the points in the model. However, after further experimentation, it turns out that KNN queries with different lifetimes require method-level generics, which is not compatible with multiple metric type params. A 2nd type param is still nice to have, but it's not actually useful for models like hnsw
and I'll be replacing it with a GAT once that's stabilized.
By the way, why isn't there a trait for inserting into types that don't implement KnnMap
?
By the way, why isn't there a trait for inserting into types that don't implement
KnnMap
?
The reason is that the KnnMap
trait adds the associated type Value
, which is then used when inserting. There could be a KnnMapInsert
and then a separate KnnPointInsert
trait, which would allow insertion for data structures which do not support the map functionality, but map functionality is more versatile in general and should be relatively easy to add to any KNN search datastructure.
Since having a 2nd type param without GATs isn't actually useful for its intended purpose, I'll close this PR.
This change makes the
Metric
trait more flexible by following a similar design as theAdd
trait in std. The 2nd type param defaults to the 1st type param, so back-compat is preserved.This probably won't help integrate
Metric
with Linfa'sDistance
trait, since the latter requires method-level generics. That integration probably requires GATs.