wmayner / pyemd

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric
MIT License
479 stars 62 forks source link

EMD not leading to any results, without any error raised #57

Open ChristelDG opened 2 years ago

ChristelDG commented 2 years ago

Hello,

My question may be a very easy one but I am loosing my nerves trying to solve it. here are my parameters : first_histogram = [1. 1. 1.] second_histogram = [2. 0. 0.] distance_matrix = array([[0. , 0. , 0. , 0.60058105, 1. ], [0. , 0. , 0. , 0.60058105, 1. ], [0. , 0. , 0. , 0.60058105, 1. ], [0.60058105, 0.60058105, 0.60058105, 0. , 0.98793931], [1. , 1. , 1. , 0.98793931, 0. ]])

(My distance matrix is the result of sklearn.metrics.pairwise.cosine_distances(), so it truly is a distance matrix) Now if I try to do : D_EMD = emd(first_histogram, second_histogram, distance_matrix)

The code runs for ever without getting any results, without any Error Raised...

Does anyone have any idea what I'm doing wrong?

Thanks a lot !

Christel

wmayner commented 2 years ago

Thanks—I reproduced this on both macOS and Debian, and I found that it depends on having two zeros in the second histogram; the value of the 2.0 doesn't matter; and changing the extra mass penalty doesn't seem to help.

This is almost certainly a problem with how the underlying algorithm in C++ handles some edge cases; unfortunately I don't have the bandwidth right now to look into it further. Likely related to #54. If you discover the issue then a PR would be most welcome.