Mismatch against another EMD library

zhangmozhe commented 4 years ago

Excuse my poor understanding towards EMD. I test geomloss with the following code:

points1 = torch.rand(2, 1024, 2, requires_grad=True).cuda()
points2 = torch.rand(2, 1024, 2).cuda()
emd_sinkhorn = geomloss.SamplesLoss(loss='sinkhorn', p=2, blur=0.01, backend='auto')(points1, points2)

Also, I try to use another EMD library (https://github.com/Colin97/MSN-Point-Cloud-Completion/tree/master/emd) which uses auction algorithm:

emd-auction, assignment = emd(points1, points2, eps=0.05, iters=2000)

However, two experiments give different results. How can I change the usage of geomloss to match another one? Thanks for your help very much!

wzm2256 commented 4 years ago

Please try debias=False and p=1.

zhangmozhe commented 4 years ago

Please try debias=False and p=1.

Thanks! I will have a try.

zhangmozhe commented 4 years ago

Please try debias=False and p=1.

It works when I use the following code:

geomloss.SamplesLoss(loss='sinkhorn', debias=False, p=1, blur=1e-3, scaling=0.999, backend='auto')(points1, points2)

wzm2256 commented 4 years ago

:) You are welcome. This setting let you use entropic regularized (debias=False) EMD (p=1) loss, which is in accordant with the description in the library you mentioned.

gdwei commented 4 years ago

Excuse my poor understanding towards EMD. I test geomloss with the following code:
points1 = torch.rand(2, 1024, 2, requires_grad=True).cuda()
points2 = torch.rand(2, 1024, 2).cuda()
emd_sinkhorn = geomloss.SamplesLoss(loss='sinkhorn', p=2, blur=0.01, backend='auto')(points1, points2)
Also, I try to use another EMD library (https://github.com/Colin97/MSN-Point-Cloud-Completion/tree/master/emd) which uses auction algorithm:
emd-auction, assignment = emd(points1, points2, eps=0.05, iters=2000)
However, two experiments give different results. How can I change the usage of geomloss to match another one? Thanks for your help very much!

Isn't this emd code (https://github.com/Colin97/MSN-Point-Cloud-Completion/tree/master/emd) only works for points in 3 dims?

ThibaultGROUEIX commented 4 years ago

Can you share your minimal exemple @zhangmozhe ? I am unable to reproduce. As you can see from the example below, the output of geomLoss seems to differ from @colin97 lib.

My minimal code is :

import time
import sys
import torch
from geomloss import SamplesLoss
import emd_module as dist_emd_3d
emd3d = dist_emd_3d.emdModule()
geomLoss = SamplesLoss(loss="sinkhorn", p=1, blur=0.05, scaling=0.999, debias=False, backend='auto')

x = torch.FloatTensor(1,1024, 3).cuda()
x.data.uniform_(0,1)
y = torch.randn(1,1024, 3).cuda()
y.data.uniform_(0,1)

start_time = time.time()
L1 = geomLoss(x, y)
print("GeomLoss : ",torch.mean(L1).item())
print("GeomLoss : %s seconds " % (time.time() - start_time))

start_time = time.time()
L1, idx1 = emd3d(x, y, 0.05, 20000)
L2, idx2 = emd3d(y, x, 0.05, 20000)
print("Colin97 x->y: ",torch.mean(torch.sqrt(L1)+torch.sqrt(L2).item())
print("Colin97 y->x : ",torch.mean(torch.sqrt(L1)).item())
print("Colin97  x-y + y->x: ",torch.mean(torch.sqrt(L2)).item())
print("Colin97  : %s seconds " % (time.time() - start_time))

which outputs :

GeomLoss :  0.30643346905708313
GeomLoss : 1.2830226421356201 seconds
Colin97 x->y:  0.01813390478491783
Colin97 y->x :  0.009060461074113846
Colin97  x-y + y->x:  0.009073445573449135
Colin97  : 0.5851273536682129 seconds

Thanks :) !

zhangmozhe commented 4 years ago

This is the EMD loss I use:

class EMDLoss(nn.Module):
    def __init__(self, reduce="none", sinkhorn=True):
        super(EMDLoss, self).__init__()
        self.reduce = reduce
        self.sinkhorn = sinkhorn

    def forward(self, xyz1, xyz2):
        """
        NOTE: we only calculate gradient for xyz1
        Calculates the Earth Mover Distance (or Wasserstein metric) between two sets
        of points.
        :param xyz1:
            a point cloud of shape ``(b, n1, k)`` or ``(n1, k)``.
        :param xyz2:
            a point cloud of shape (b, n2, k) or (n2, k).
        :param reduce:
            ``'mean'`` or ``'sum'``. Default: ``'mean'``.
        :param sinkhorn:
            whether to use the Sinkhorn approximation of the Wasserstein distance.
            ``False`` will fall back to a CUDA implementation, which is only available
            if the CUDA-extended neuralnet-pytorch is installed.
            Default: ``True``.
        :return:
            the EMD between the inputs.
        """

        assert self.reduce in ("mean", "sum", "none"), "Reduce method should be mean or sum"
        if self.sinkhorn:  # BUG: problematic when using sinkhorn
            import geomloss

            sinkhorn_loss = geomloss.SamplesLoss(
                loss="sinkhorn", debias=True, p=1, blur=1e-3, scaling=0.6, backend="auto"
            )(xyz1, xyz2)
            if self.reduce == "mean":
                return torch.mean(sinkhorn_loss)
            else:
                return sinkhorn_loss
        else:
            emd_dist1, assignment1 = emdModule()(xyz1, xyz2, eps=0.05, iters=2000)
            emd1 = torch.sqrt(emd_dist1).mean(dim=-1)

            if self.reduce == "mean":
                return torch.mean(emd1)
            else:
                return emd1

points1 = torch.rand(2, 2048, 3, requires_grad=True).cuda()
points2 = torch.rand(2, 2048, 3).cuda()

emd_sinkhorn = EMDLoss()(points1, points2, sinkhorn=True)
print("emd_sinkhorn =", emd_sinkhorn)

emd_msn = EMDLoss()(points1, points2, sinkhorn=False)
print("emd_msn =", emd_msn)

ThibaultGROUEIX commented 4 years ago

Thanks @zhangmozhe Your code points out that I forgot torch.sqrt in mine in Colin97's emd, so i am updated my above snippet just in case.

However I am not sure why blur=0.001 for geomLoss translate into eps=0.05 for colin97 ? Aren't those two variables supposed to be the same?

Your code outputs:

emd_sinkhorn = tensor([0.0546, 0.0551], device='cuda:0', grad_fn=<AddBackward0>)
emd_msn = tensor([0.0673, 0.0692], device='cuda:0', grad_fn=<MeanBackward1>)

Thanks,

zhangmozhe commented 4 years ago

Hi @ThibaultGROUEIX, since they are different methods, I think the parameter cannot be easily translated. I tried to refer to https://www.kernel-operations.io/geomloss/_auto_examples/sinkhorn_multiscale/plot_epsilon_scaling.html#sphx-glr-auto-examples-sinkhorn-multiscale-plot-epsilon-scaling-py, but do not fully comprehend it yet. it seems a small blur value, e.g, 0.001 is a safe choice.

ThibaultGROUEIX commented 4 years ago

Thanks @zhangmozhe !

jeanfeydy / geomloss

Mismatch against another EMD library #29