jeanfeydy / geomloss

Geometric loss functions between point clouds, images and volumes
MIT License
599 stars 60 forks source link

Negative result with SamplesLoss #24

Closed wazizian closed 4 years ago

wazizian commented 4 years ago

Hi, First of all, thank you for making this library publicly available.

As indicated in the title, I have encountered examples where the loss from SamplesLoss returns a negative value. I have saved such an example and it consists of two sets of 50 points living in R^160. They are quite close as the maximum distance between the points of those two sets is about 0.25 (see notebook below).

The loss is the default sinkhorn divergence and I have tried both the default parameters and more accurate ones.

default_loss = geomloss.SamplesLoss()
accurate_loss = geomloss.SamplesLoss(blur=1e-3, scaling=0.999)

The results are the following: Default Loss: -0.00108 More Accurate Loss: -0.000130

I have reproduced such behavior on my computer and on Google Colab. Here is the notebook.

It is possible that it is just me who misconfigured something. Please let me know if this is the case. Otherwise, if you think that it shouldn't happen with those parameters, I can look at the code more thoroughly to try to fix it.

Thank you, Bests, Waïss

jeanfeydy commented 4 years ago

Hi @wazizian ,

What a good surprise to hear from you :-) Thanks a lot for your report: I think that the problem here is due to numerical instability more than anything else. Your point clouds are both really close to each other, concentrated in a tiny region of the ambient space while the default GeomLoss settings are designed for normalized point clouds. In your case, instead of playing around with the parameters, I would simply suggest to translate and rescale your data before feeding it to the Sinkhorn loss:

x, y = ...
p = 2
default_loss = geomloss.SamplesLoss(p = p)
accurate_loss = geomloss.SamplesLoss(p = p, blur=1e-3, scaling=0.999)

z = torch.cat((x, y))
offset = z.mean(dim=0)
scale = 10 * (z - offset).abs().mean()

xx, yy = (x - offset) / scale, (y - offset) / scale
print("Default Loss: ",       (scale**p) * default_loss(xx, yy).item())
print("More Accurate Loss: ", (scale**p) * accurate_loss(xx, yy).item())

Of course, in practice, you may want to use a fixed rescaling factor so as to normalize your blur parameter once and for all. What do you think? Best regards,

Jean

wazizian commented 4 years ago

Thanks a lot for your answer! It makes sense indeed (and solves the problem on this particular example). However, I don't really understand why one shouldn't rescale the data like this in practice. Indeed, if I also rescale the blur by scale, it should be fine, isn't it? (I realized it would break auto-diff unfortunately...) Thank you, Bests, Waïss

jeanfeydy commented 4 years ago

Hi @wazizian ,

You're welcome! As for the rescaling of the data: indeed, this is definitely something that I should implement automatically... It just never occured to me :-) And don't worry for the autodiff: PyTorch really is a flexible library. A code along the lines of:


x, y = ...
p = 2
blur = 1e-3

z = torch.cat((x, y))
offset = z.mean(dim=0)
scale = 10 * (z - offset).abs().mean().detach()

xx, yy = (x - offset) / scale, (y - offset) / scale

accurate_loss = geomloss.SamplesLoss(p = p, blur= blur / scale, scaling=0.9)

print("Accurate Loss: ", (scale**p) * accurate_loss(xx, yy).item())

should be completely fine. Notice the .detach() which allows you to "cut" the unnecessary backprop through the computation of the scaling factor, acting as though scale was an external constant even though it is actually estimated on-the-fly.

Best regards, Jean