Closed toooooodo closed 2 years ago
Oh! The index of class in D2 is from 11 to 19 if debiased loss is True. So the class distance index is correct and my previous understanding is wrong. But I still confused about why should we compute class distance in the same dataset if debiased loss is True.
Hi @toooooodo. When debiased_loss=True
, we also need to compute label-to-label distances within each of the two datasets. To avoid carrying around 3 different tensors, we stack all of them together in a block-wise matrix of size (k + k')**2, assuming the datasets have k
and k'
classes respectively. The diagonal blocks of this matrix are the within-domain label distances, and the off-diagonal (the matrix is symmetric, so the two off-diagonal blocks are the same) are the usual across-domains label distances that you would get if you run OTDD with debiased_loss=False
. I hope that clarifies it!
Thanks for your immediate reply!
I understand that we have 3 tensors (label-to-label distance in D1, label-to-label distance in D2, and label-to-label distance across D1 and D2) and we stack all of them together to a symmetric matrix of size (k + k')**2. But why should we compute label-to-label distances within two datasets when debiased_loss=True
? I don't quite understand this parameter.
Could you please clarify the effect of this parameter and the reason to compute distance within datasets?
Ah, got it. So your question is about how the debiased parameter works in general. When debiased_loss=True
we compute an unbiased version of the sinkhorn divergence: d_debiased(a,b) = d(a,b) - 0.5(d(a,a) + d(b,b)). You can check out this paper for details: http://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf, but basically this is done to guarantee that d(a,a) = 0, which in turn leads to unbiased gradients (note this is not the case in general for the vanilla sinkhorn loss).
Thanks! I'll check out this paper.
Hi, thanks for your work and codes. I'm confused about the
debiased_loss
parameter inDatasetDistance
. And I have two questions:debiased_loss
isTrue
?example.py
given by this repo, and I notice that in batch_augmented_cost function, size ofW
is [20, 20] rather than [10, 10]. I realise this is because we concatenate class distance of D1 and D2 like[[DYY1, DYY12], [DYY21, DYY2]]
here . But I'm afraid that the following operation gives a wrong index to get class distance inW
. For example, the index of class0
inD1
and class0
inD2
is0 * 20 + 0 = 0
, butW.flatten()[0]
is the distance between class0
inD1
and class0
inD1
.I don’t know whether my understanding is correct.