neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://treeple.ai
Other
67 stars 14 forks source link

"Distance Metrics" #119

Closed jovo closed 8 months ago

jovo commented 1 year ago

"Distance Metrics" is redundant. Distances are Metrics.
also, the two functions under that are not distances or metrics.

adam2392 commented 1 year ago
  • similarity - not a distance

Yeah this can be reworded. We currently expose a similarity, or I guess kernel computation among samples, not distances. But we convert this to a distance by just doing:

dists = 1.0 - similarity_matrix_normalized
  • NNMetaEstimator - what is a meta-estimator? i've never heard that term. afaict, it is just computing nearest-neighbors, using the forest-based distance? not estimating anything technically? maybe estimating geodesic distance/neighbors?

a meta-estimator is scikit-learn terminology for a class that gets passed in another Estimator. E.g. this is a WIP for me to implement an API for estimating nearest-neighbors using any arbitrary tree/forest Estimator as the "base estimator".

  • why don't we expose actually computing the distance, the way cencheng says to?

Sure we can do that. Do you have a specific reference to what you are talking about?

sampan501 commented 1 year ago

I believe the way written is the Cencheng method for distance to kernel transformations. i.e.

dists = 1.0 - similarity_matrix_normalized

In this case, max(K_ij) is 1 since similarities are in [0, 1]

adam2392 commented 8 months ago

All forest estimators have distance_matrix = 1 - compute_similarity_matrix(X)