Closed griff4692 closed 2 years ago
I also had a question about
loss_func = torch.nn.MarginRankingLoss(margin * i)
In the paper, it says
is the margin multiplied by the difference in rank between the candidates
It appears that the margin is based solely on the rank or index of the higher rated candidate. Is this correct?
Hi, thank you for your interest in our work.
I wanted to note that this loss function is adapted from MatchSum.
For TotalLoss
, they have an explanation here is to avoid that some special samples will not go into the following for loop. I always think of it as just a placeholder.
For your second question about the margin, please refer to this thread: https://github.com/yixinL7/SimCLS/issues/6.
Please let me know if you have more questions.
Ahh thanks Yixin -
Yes, I've noticed it's the same pairwise calculation from MatchSum. I see with TotalLoss
-- just wanted to make sure it was meant to be an empty calculation.
I'm curious if you have any data comparing this pairwise ranking with other objectives:
Contrastive Loss: align positives in decoder latent space (CLIFF) ConSeq (Unlikelihood) Loss: CONSEQ
I'm working on a comparison of methods / metrics / positive-negative selection strategies but not for news summarization. It's interesting to see if adjusting the likelihood (as in unlikelihood and BRIO) is more effective than simple aligning positive decoder states (CLIFF paper, other non-summarization contrastive learning papers).
Hi Griffin, I have also found this comparison very interesting! My guess is adjusting the likelihood has a more direct impact on the decoding output than adjusting the latent representation, but I haven't tried to compare them empirically myself. I'm looking forward to seeing your work on this!
Hi - Thanks for the great code. I've been trying to re-implement BRIO in my HuggingFace fork, but unable to get it to work.
I'm curious what this line in RankingLoss is doing:
TotalLoss = loss_func(score, score, ones)
One possibility is that I haven't yet included the gold reference as part of the ranking loss, which might explain why the contrast loss is causing the gold standard MLE loss to rise too highly. I will add that but was also curious about the above function. Thank you!!