microsoft / otdd

Optimal Transport Dataset Distance
MIT License
151 stars 48 forks source link

Questions about debiased_loss #10

Closed toooooodo closed 2 years ago

toooooodo commented 2 years ago

Hi, thanks for your work and codes. I'm confused about the debiased_loss parameter in DatasetDistance. And I have two questions:

  1. In get_label_distances, why do we also need to compute class distance in the same dataset if the debiased_loss is True?
  2. I run the example.py given by this repo, and I notice that in batch_augmented_cost function, size of W is [20, 20] rather than [10, 10]. I realise this is because we concatenate class distance of D1 and D2 like [[DYY1, DYY12], [DYY21, DYY2]] here . But I'm afraid that the following operation gives a wrong index to get class distance in W. For example, the index of class 0 in D1 and class 0 in D2 is 0 * 20 + 0 = 0, but W.flatten()[0] is the distance between class 0 in D1 and class 0 in D1.
    M = W.shape[1] * Y1[:, :, None] + Y2[:, None, :]
    C2 = W.flatten()[M.flatten(start_dim=1)].reshape(-1,Y1.shape[1], Y2.shape[1])

    I don’t know whether my understanding is correct.

toooooodo commented 2 years ago

Oh! The index of class in D2 is from 11 to 19 if debiased loss is True. So the class distance index is correct and my previous understanding is wrong. But I still confused about why should we compute class distance in the same dataset if debiased loss is True.

dmelis commented 2 years ago

Hi @toooooodo. When debiased_loss=True, we also need to compute label-to-label distances within each of the two datasets. To avoid carrying around 3 different tensors, we stack all of them together in a block-wise matrix of size (k + k')**2, assuming the datasets have k and k' classes respectively. The diagonal blocks of this matrix are the within-domain label distances, and the off-diagonal (the matrix is symmetric, so the two off-diagonal blocks are the same) are the usual across-domains label distances that you would get if you run OTDD with debiased_loss=False. I hope that clarifies it!

toooooodo commented 2 years ago

Thanks for your immediate reply! I understand that we have 3 tensors (label-to-label distance in D1, label-to-label distance in D2, and label-to-label distance across D1 and D2) and we stack all of them together to a symmetric matrix of size (k + k')**2. But why should we compute label-to-label distances within two datasets when debiased_loss=True? I don't quite understand this parameter. Could you please clarify the effect of this parameter and the reason to compute distance within datasets?

dmelis commented 2 years ago

Ah, got it. So your question is about how the debiased parameter works in general. When debiased_loss=True we compute an unbiased version of the sinkhorn divergence: d_debiased(a,b) = d(a,b) - 0.5(d(a,a) + d(b,b)). You can check out this paper for details: http://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf, but basically this is done to guarantee that d(a,a) = 0, which in turn leads to unbiased gradients (note this is not the case in general for the vanilla sinkhorn loss).

toooooodo commented 2 years ago

Thanks! I'll check out this paper.