Open bryanwong17 opened 4 days ago
Hi, sorry for the confusion. In the paper, gamma is a "learned" temperature, which is not equivalent to "learnable". From our understanding, "learned" means the parameter is already learned. In the implementation, gamma = 1 / self.logit_scale, which is learned by clip training.
Hi, thank you for the excellent work! After reviewing the paper and the code, I think the loss function differs between them. In the code, it appears that the loss is calculated as the sum of all description scores within the same class
followed by cross-entropy with the ground-truth label:
However, in the paper, the loss function includes a learnable temperature parameter, 𝛾
Could you clarify if these are indeed different? Thank you!