yixinL7 / SimCLS

Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021
183 stars 25 forks source link

The difference between loss function and loss code part #6

Closed Jexxie closed 3 years ago

Jexxie commented 3 years ago

Thanks for your excellent work. I have a question about loss computation. Is there any difference between the loss function in the paper and the code part? The loss function in the pape: image image

But the code part: image

It seems the code part just computes +ilamda instead of +(j-i) @lamda. Did I miss something?

yixinL7 commented 3 years ago

Thank you for your interest in our work! Below is the code snippet for the ranking loss:

image

This function indeed computes the loss described in our paper. To see this, you may try to focus on a score of a single candidate summary (e.g. the first one, which has the best ROUGE score), and figure out which score it is compared with (line 21) as the loop (line 14) is going. You will see the score of the first candidate is compared with all the other candidates when the loop is completed and the value of the margin is corresponding to j - i (in this special case i = 0).

Jexxie commented 3 years ago

Thanks for your reply!! But there are still some questions.

  1. The value of the margin in line 20 is margini, why it can represent margin(j-i)?
  2. What will happen if i equals to 2. Pos_score represents the top n-2 scores, and neg_score represents the last n-2 scores, so there must be some scores have been chosen twice. But in the loss function, I saw there is a condition that j>i, which is not corresponding. How to explain this?
yixinL7 commented 3 years ago

I'll answer the second question first as I think it would be helpful for you to understand the first question. Indeed there are some scores that have been chosen twice, but they have different roles in these two cases. Namely, they are either chosen to be compared with a score higher than them or a score lower than them. In line 21, for every item of pos_score and neg_score, the corresponding item of pos_score is always greater than neg_score. As for the first question, I hope this hint will help you to come up with an answer yourself: In line 20, i serves the same purpose as j-i in the equation listed in the paper. Another hint may also be helpful: the loop (line 14) in the code is not corresponding to the outer summation in the equation. On the other hand, a summation step that corresponds to a specific value of i in the equation is only finished when the loop in the code is completed. Please let me know if you have more questions! :)