meder411 / PyTorch-EMDLoss

PyTorch 1.0 implementation of the approximate Earth Mover's Distance
136 stars 13 forks source link

Inconclusive values between EMDLoss and SciPy's Wasserstein distance #2

Open flxai opened 5 years ago

flxai commented 5 years ago

I've written a gist, trying to compare scipy.stats.wasserstein_distance with the EMDLoss implemented here. Unfortunately it seems like I'm using the implementation in a wrong way, because the values differ for the following test signals:

plot

(x, y) EL WD
(a, b) 0.04349525 0.0
(a, c) 0.46960036 0.12115367548619291
(b, c) 0.89565078 0.12115367548619291

Shouldn't EMDLoss and SciPy's Wasserstein Distance actually take on the same values? Am I wrong about the conversion of the signals through signal[None, :, None]? I tried mimicking the shapes from test_emd_loss.py, but don't quite grasp how this can be transferred to 1D signals. Thanks

meder411 commented 5 years ago

Hmmm. I'm don't think so. This code provides an efficient approximation to the Earth Mover's Distance, but does not solve the linear program directly. I haven't looked at SciPy's Wasserstein distance function but they may actually solve the linear program directly. That would explain a difference in the numbers.

From what I understand, the code that I adapted this from is built on the algorithm presented in D.P.Bertsekas. A distributed asynchronous relaxation algorithm for the assignment problem. In Decision and Control, 1985 24th IEEE Conference on, pages 1703–1704. IEEE, 1985. The high level is to use this relaxation method to match samples between the two distributions and then simply take the distance between samples. Hence, for shifted versions of the same distribution (which I assume a and b are), this algorithm will produce different distances. My guess is that SciPy's Wasserstein distance is normalizing the distributions before computing the difference, hence (a,c) and (b,c) end up with identical distances.

flxai commented 5 years ago

Thanks for your response. I tried hard understanding the differences, but they aren't yet clear to me.

SciPy seems to use the Cramér distance (cf. _cdf_distance) as an intermediary format to compute the Wasserstein distance (cf. wasserstein_distance). This relies on the algorithm formulated in The Cramer Distance as a Solution to Biased Wasserstein Gradients (2017) :arXiv:1705.10743.

I wonder whether it is possible to choose parameters for your implementation to match the values of the SciPy implementation. Unfortunately I did not yet find any.

My guess is that SciPy's Wasserstein distance is normalizing the distributions before computing the difference, hence (a,c) and (b,c) end up with identical distances.

It doesn't look that way in the sources. As well, I'd expect all three to be identical if normalized.

meder411 commented 5 years ago

Sorry I misspoke. By normalization, I meant just center the distributions. I am not sure how to best match their implementation, but I would be interested if you find any. When I have some time I will revisit this and perhaps update the implementation.