Very different results for Wasserstein distance compared to Gudhi

fbeilstein commented 1 year ago

When transfering from Gudhi (library for Topological Data Analysis) I've spotted significant difference in results

from gudhi.wasserstein import wasserstein_distance
import numpy as np

dgm1 = np.array([[2.7, 4.11],[9.6, 14.],[34.2, 34.974]])
dgm2 = np.array([[2.8, 4.45],[9.5, 14.1]])

wasserstein_distance(dgm1, dgm2, order=1)

yields 0.8269999999999964, while

import torch
from geomloss import SamplesLoss
import numpy as np

I1 = torch.Tensor(dgm1)
I2 = torch.Tensor(dgm2)
I1.requires_grad_()

loss = SamplesLoss(loss='sinkhorn', debias=False, p=1, blur=1e-3, scaling=0.999, backend='auto')
L= loss(I1, I2)
torch.mean(L).item()

yields 12.854110717773438.

I'm not sure if this is a bug or me doing something wrong, but I suppose it to be worth reporting.

jeanfeydy commented 1 year ago

Hi @fbeilstein,

Thanks for your interest in this library! The mismatch here comes from the fact that Gudhi is not computing the usual Wasserstein distance, but a non-standard variation (documented here) that is tailored for persistence diagrams:

By default, Gudhi relies on the infinity norm between points.
Gudhi adds an extra point to both distributions (with a fairly large mass) to account for the special role that the "diagonal" plays in persistence diagrams. The distance from all points to the diagonal is computed via orthogonal projection: this corresponds to adding a custom row and column to the "usual" matrix of distances.

Going forward, I suggest the following:

If you are interested in computing a Wasserstein distance between "standard" distributions in the "usual" plane, you can simply use GeomLoss.
If you want to scale up the exact same computation as Gudhi (including the special role played by the diagonal, etc.) using GeomLoss/KeOps, I believe that the simplest thing to do would be to write your own Sinkhorn solver (with annealing), following the examples of Algorithm 3.5 in my PhD thesis and Appendix E in this NeurIPS paper. This should be around 30 lines of Python, including the specific management of the diagonal, so nothing too hard if you understand what is happening in Gudhi's code.

I remember a discussion with @tlacombe on the topic: he may already have some pointers and/or implementation available somewhere?

Best regards, Jean

tlacombe commented 1 year ago

Hello,

TL;DR: I completely agree with @jeanfeydy on this topic: this is not a bug and the results are completely normal, what gudhi calls wasserstein is not the same as what geomloss calls Wasserstein (the seminal Wasserstein, tbh).

Details: somewhat unfortunately, the naming wasserstein that is used in Topological Data Analysis (in particular in gudhi) is overloading what people usually call Wasserstein in the optimal transport community.

As Jean said, gudhi.wasserstein should be called (if we want to stick to OT formalism) gudhi.unbalanced_optimal_transport_with_Dirichlet_boundary_condition , as it actually correspond to the model proposed by Figalli and Gigli in this paper, as detailed here. Due to the clear similarities between these two models, the TDA community sticks to calling the metrics between persistence diagrams Wasserstein (hence the name chosen in gudhi), but I agree that this can be the source of some confusion.

Unfortunately, I do not have a KeOps/geomloss-friendly implementation of "Wasserstein-for-persistence-diagrams" available. I agree with Jean, implementing this should be doable and could be quite useful (I was considering working on that at some point and then, you know, time flies...). I however warn that adding an entropic term to (properly) regularize PD metrics is not as easy as I expected in the past as it may yield inhomogeneity issues, something I have been discussing there.

I am open to discussion if someone is willing to spend some time implementing gudhi.wasserstein using KeOps.

fbeilstein commented 1 year ago

Hi @jeanfeydy!

Thank you so much for the detailed explanation! Now things are starting to make sense as I was already getting desperate trying to find the source of the problem. I'll review your thesis and, if I can wrap my head around it, try to implement some code.

Sincerely, Vitalii

pugantsov commented 1 year ago

Sorry to hijack the thread a little here but I didn't want to create a new one for such a similar question. I was wondering if you could shed light on the differences between GeomLoss' Wasserstein and Scipy's Wasserstein Distance.

Using the same settings above,

a = np.array([6, 4, 3, 0, 0, 1, 0, 8])
b = np.array([0, 8, 1, 0, 0, 1, 0, 2])

a_dist = a / np.sum(a)
b_dist = b / np.sum(b)

geomloss_wasserstein = SamplesLoss(
    loss='sinkhorn',
    debias=False,
    p=1,
    blur=1e-3,
    scaling=0.999,
    backend='auto'
)(torch.tensor(a_dist.reshape(1, -1)), torch.tensor(b_dist.reshape(1, -1)))
scipy_wasserstein = wasserstein_distance(a_dist, b_dist)

Gives me 0.5937, 0.0758 (respectively for GeomLoss, Scipy).

My use case is such that I want to use some of the losses defined in GeomLoss as a simple distance metric between probability distributions or vector representations for machine learning. Thanks in advance.

mglisse commented 1 year ago

)(torch.tensor(a_dist.reshape(1, -1)), torch.tensor(b_dist.reshape(1, -1)))

Did you mean reshape(-1,1)?

pugantsov commented 1 year ago

)(torch.tensor(a_dist.reshape(1, -1)), torch.tensor(b_dist.reshape(1, -1)))

Did you mean reshape(-1,1)?

I do this simply when I am testing with a single sample as the package expects (N,D) so I reshape it to (1,D).

mglisse commented 1 year ago

Sorry to hijack the thread a little here but I didn't want to create a new one for such a similar question.

Next time, you still should create a new issue for a new question.

I do this simply when I am testing with a single sample as the package expects (N,D) so I reshape it to (1,D).

Then the comparison makes no sense, because scipy only supports D=1.

pugantsov commented 1 year ago

Sorry to hijack the thread a little here but I didn't want to create a new one for such a similar question.

Next time, you still should create a new issue for a new question.

I do this simply when I am testing with a single sample as the package expects (N,D) so I reshape it to (1,D).

Then the comparison makes no sense, because scipy only supports D=1.

Apologies, will do. I only reshape it for Geomloss, using Scipy's wasserstein I compare the 1D distributions as they are, i.e. Geomloss: [1, 8] v. [8] for Scipy.

jeanfeydy commented 1 year ago

Hi @pugantsov ,

Thanks to @mglisse : the problem was indeed related to the switch from .reshape(1,-1) to .reshape(-1,1). The code below works as expected:

import torch
import numpy as np
from geomloss import SamplesLoss
from scipy.stats import wasserstein_distance

a = np.array([6, 4, 3, 0, 0, 1, 0, 8])
b = np.array([0, 8, 1, 0, 0, 1, 0, 2])

a_dist = a / np.sum(a)
b_dist = b / np.sum(b)

geomloss_wasserstein = SamplesLoss(
    loss='sinkhorn',
    debias=False,
    p=1,
    blur=1e-4,
    scaling=0.999,
    backend='auto'
)(torch.tensor(a_dist.reshape(-1, 1)), torch.tensor(b_dist.reshape(-1, 1)))
scipy_wasserstein = wasserstein_distance(a_dist, b_dist)

print(geomloss_wasserstein)
print(scipy_wasserstein)

Which gives 0.0759 and 0.07575.

To be clear on what caused the confusion, SamplesLoss interprets (N,D) input arrays as collections of N points in dimension D. In this context,

SamplesLoss(...)(torch.tensor(a_dist.reshape(1, -1)), torch.tensor(b_dist.reshape(1, -1)))

is interpreted as a Wasserstein distance between single-point Diract distributions in dimension len(a) = 8, which is clearly not what was expected by @pugantsov. To retrieve a comparison between 8 points in dimension 1, one should indeed call:

SamplesLoss(...)(torch.tensor(a_dist.reshape(-1, 1)), torch.tensor(b_dist.reshape(-1, 1)))

Best regards, Jean

jeanfeydy / geomloss

Very different results for Wasserstein distance compared to Gudhi #71