Closed needylove closed 2 years ago
Hey @needylove,
Thanks for your interest. That's a good question. It's been a while, but I think the following proof is correct.
Let me know if anything seems unclear. Best.
Dear @mboudiaf,
Thanks for your kind reply! Just one more question: Why is the sampling from the feature distribution equal to the sum over all the z_i where yi=k. As we do not know the concrete distribution of \hat{Z} | Y, we may not be able to sample \hat{Z} directly. Besides, \sum{i:y_i=k} seems to contain all the data belonging to class k, rather than a sampled part.
Thanks. Best.
HI @needylove,
We do not know the true density of \hat{Z} | Y=k, but we can sample from it (this simply corresponds to extracting features for images from class k). The sum over the "i" should be understood as "you sum over all the indices i in the current batch of samples, such that y_i = k". The expectation is therefore replaced by this sum through a Monte-Carlo empirical estimate. Is it clearer ? Best ! I
Dear @mboudiaf,
Much clearer with your kind help. Yet, I lost the link between the distribution of the input X and \hat{Z} | Y=k. It seems a batch of samples is sampled from the distribution of X, although each sampled x corresponds to a \hat{Z}, yet, I do not understand why we can treat the corresponding \hat{Z} as a sample. In another word, assume x' \in X has a higher probability to be sampled and thus the corresponding \hat{z'} also has a higher probability to be sampled, yet, \hat{z'} may not has a high probability to be sampled under the distribution \hat{Z} | Y=k.
Thanks! Best.
Dear authors,
Thanks for your wonderful work. I really like it, yet I got confused about why the center loss can be interpreted as a conditional entropy between \hat{Z} and \bar Z.
Might I have your kind reply. Thanks.