wwwjk366 / gower

Python package for Gower distance
MIT License
75 stars 22 forks source link

Parallel implementation [question] #5

Open Dan-Treacher opened 2 years ago

Dan-Treacher commented 2 years ago

Hi there

I was wondering whether anyone had any insights into methods that could be used to parallelise the gower distance calculation for data with mixed numerical and nominal categorical.

I've seen some people suggesting using sklearn.metrics.pairwise_distances with a custom function for the metric= argument, but I think pairwise_distances can only take numerical inputs which wouldn't work with high cardinality nominal categorical data (one hot encoding would result in thousands of columns)

https://hal.archives-ouvertes.fr/hal-02047514/document This paper might start some discussion?

Thanks

adinathauti commented 2 years ago

Perhaps a similar type of request

Would it be possible to get a sklearn.neighbors.KDTree like structure with Gower's distances for quick nearest neighbour queries?

The default sklearn KDTree does not allow for callable functions as metrics