Closed scottgigante closed 6 years ago
Thanks for the suggestion! Sure, I'd be happy to look at a PR for creating histograms automatically.
However, I'm reluctant to add dependencies like sklearn
or scipy
, as these are quite heavy. If you feel that creating the distance matrix automatically is crucial, it may make sense to create a separate package that implements this instead, with pyemd
as a dependency.
Also, in your function, you would probably want to add a metric='euclidean'
keyword argument that gets passed to pairwise_distances
, in case the user wants a different metric.
Both good points. I would happily drop those dependencies as a pairwise distance matrix is pretty easy to implement - however, without scipy
I would probably only include euclidean distance. Perhaps an option to pass in a custom metric function (e.g. partial(scipy.spatial.distance.pdist, metric='cosine')
) with default being an internal implementation of euclidean distance would be best?
That sounds great. For the Euclidean distance default, you can just use numpy.linalg.norm(x, y)
.
Also, in your function, you would probably want to add a
metric='euclidean'
keyword argument that gets passed topairwise_distances
, in case the user wants a different metric.
So, here I have a question about the distance matrix: is this distance matrix has to be the distance between the middle points of the bins? Can the distance be defined as the distance between the centroid of the corresponding bins? In my case, I am using a colored point cloud, which means I create the histogram by the intensity (color) of the point cloud, and I would like to apply the spatial information as well, so can I use the Euclidean distance between the centroid points in corresponding bins (x1-y1)^2 + (x2-y2)^2 + (x3-y3)^2
Hello,
I believe users would find it useful to have a built in method for calculating the EMD statistic on two arrays without having to build the histograms and distance matrix.
I've written a simple function to do this myself - I'm happy to write it up properly and submit a pull request if you're happy to incorporate it into the API.