Open Dan-Treacher opened 2 years ago
Perhaps a similar type of request
Would it be possible to get a sklearn.neighbors.KDTree like structure with Gower's distances for quick nearest neighbour queries?
The default sklearn KDTree does not allow for callable functions as metrics
Hi there
I was wondering whether anyone had any insights into methods that could be used to parallelise the gower distance calculation for data with mixed numerical and nominal categorical.
I've seen some people suggesting using
sklearn.metrics.pairwise_distances
with a custom function for themetric=
argument, but I thinkpairwise_distances
can only take numerical inputs which wouldn't work with high cardinality nominal categorical data (one hot encoding would result in thousands of columns)https://hal.archives-ouvertes.fr/hal-02047514/document This paper might start some discussion?
Thanks