"Distance Metrics" - Githubissues

jovo commented 1 year ago

"Distance Metrics" is redundant. Distances are Metrics.
also, the two functions under that are not distances or metrics.

similarity - not a distance
NNMetaEstimator - what is a meta-estimator? i've never heard that term. afaict, it is just computing nearest-neighbors, using the forest-based distance? not estimating anything technically? maybe estimating geodesic distance/neighbors?
why don't we expose actually computing the distance, the way cencheng says to?

adam2392 commented 1 year ago

similarity - not a distance

Yeah this can be reworded. We currently expose a similarity, or I guess kernel computation among samples, not distances. But we convert this to a distance by just doing:

dists = 1.0 - similarity_matrix_normalized

NNMetaEstimator - what is a meta-estimator? i've never heard that term. afaict, it is just computing nearest-neighbors, using the forest-based distance? not estimating anything technically? maybe estimating geodesic distance/neighbors?

a meta-estimator is scikit-learn terminology for a class that gets passed in another Estimator. E.g. this is a WIP for me to implement an API for estimating nearest-neighbors using any arbitrary tree/forest Estimator as the "base estimator".

why don't we expose actually computing the distance, the way cencheng says to?

Sure we can do that. Do you have a specific reference to what you are talking about?

sampan501 commented 1 year ago

I believe the way written is the Cencheng method for distance to kernel transformations. i.e.

dists = 1.0 - similarity_matrix_normalized

In this case, max(K_ij) is 1 since similarities are in [0, 1]

adam2392 commented 8 months ago

All forest estimators have distance_matrix = 1 - compute_similarity_matrix(X)

neurodata / treeple

"Distance Metrics" #119