xiaofang007 / ViP

[MICCAI 2024 Early Accept, Oral] Aligning Medical Images with General Knowledge from Large Language Models
18 stars 2 forks source link

Different Loss Function #7

Open bryanwong17 opened 4 days ago

bryanwong17 commented 4 days ago

Hi, thank you for the excellent work! After reviewing the paper and the code, I think the loss function differs between them. In the code, it appears that the loss is calculated as the sum of all description scores within the same class

for i, (k, v) in enumerate(text_features.items()):
  logits[:,i:i+1] += logit_scale*score

followed by cross-entropy with the ground-truth label:

output = self.model(image)
loss = F.cross_entropy(output, label)

However, in the paper, the loss function includes a learnable temperature parameter, 𝛾

image

Could you clarify if these are indeed different? Thank you!

xiaofang007 commented 3 days ago

Hi, sorry for the confusion. In the paper, gamma is a "learned" temperature, which is not equivalent to "learnable". From our understanding, "learned" means the parameter is already learned. In the implementation, gamma = 1 / self.logit_scale, which is learned by clip training.