wmayner / pyemd

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric
MIT License
479 stars 62 forks source link

Zero EMD for equal Histograms #58

Closed njanakiev closed 2 years ago

njanakiev commented 2 years ago

Thanks for the project!

I've been comparing the results between pyemd and PythonOT/POT and I found inconsistencies in the results. Not sure if I am applying it in a wrong way.

numpy 1.20.3
scipy 1.8.0
ot 0.8.2
pyemd 0.5.1

Here is the code to reproduce it:

import numpy as np
import ot
import pyemd
from scipy.spatial import distance_matrix

np.random.seed(0)

n = 3
X = np.random.random((n, 2))
Y = np.random.random((n, 2))
D = distance_matrix(X, Y)

d0 = np.ones(n) / n
d1 = np.ones(n) / n

e, flow = emd_with_flow(d0, d1, D)
print(e, flow)

Output:

(0.0, [[0.333333, 0.0, 0.0], [0.0, 0.333333, 0.0], [0.0, 0.0, 0.333333]])

In fact, the earth mover distance is usually 0 for histograms that are uniform.

Here with POT:

e = ot.emd2(d0, d1, D)
flow = ot.emd(d0, d1, D)

print(e, flow.tolist())

Output:

0.31591990155991395 [[0.0, 0.0, 0.3333333333333333], [0.0, 0.3333333333333333, 0.0], [0.3333333333333333, 0.0, 0.0]]

The flow is also different in this case.

Am I missing something or is this some issue with the code in either repository?

wmayner commented 2 years ago

I am not familiar with the other library, but the result from PyEMD looks correct. If the two histograms are equal, then the earth mover's distance between them is zero, as no mass needs to be moved.

Probably the discrepancy can be explained by the fact that the distance matrix you're using does not represent a metric, as it doesn't satisfy any of the metric axioms. As described in the documentation, PyEMD's implementation assumes these properties and does not check them; results are not guaranteed to be correct if they are not satisfied. In particular, there is no identity of indiscernibles, since the diagonal of the matrix is not zero. My guess is that the implementation of the other library deals with this differently.