How reward loss used in training

yuyan2do commented 4 years ago

Thanks for open-source this great work.

Could you help explain how semsim_score affect finetune BART's paramter? I saw it used in final loss (_loss = loss - loss_weight * semsim_score_). But wonder how backpropagation optimize this score by pass grad from "sentence_txt", "sentence_tok" to "output_tokens", both "sentence_txt", "sentence_tok" are not float value.

https://github.com/icml-2020-nlp/semsim/blob/306f4534f81cacc6b998c0b61011dc85060e8d22/fairseq-semsim/fairseq/criterions/semantic_similarity_loss.py#L22-L31

jpilaul commented 4 years ago

Thanks for posting your code on github. When I tested the model it does not seem like the gradient from the rewarder is actually flowing to BART. If you change: https://github.com/icml-2020-nlp/semsim/blob/306f4534f81cacc6b998c0b61011dc85060e8d22/fairseq-semsim/fairseq/criterions/semantic_similarity_loss.py#L54 to loss = loss_weight * semsim_score BART has zero gradients... Am I missing something?

icml-2020-nlp commented 4 years ago

Hi, Thanks for letting us know the issue. I will check our code and will get back to you soon. We had a few code versions and there might be some mistakes when releasing the code.

icml-2020-nlp commented 4 years ago

Hi all, Once again, we appreciate your comments. We have checked the code and found out that there were mistakes while we organize our code. The problem is related to the comment of @yuyan2do . sentence_txt was originally intended to use for debugging and monitoring the training steps.

We are currently checking further details on the issue. The examination will take about a couple of weeks as the code and the code history are not available in the current workplace at the moment.

Considering this situation, we decide to post a notice on the repository to discourage usage until we double-check and update the code. Please refrain from using the model and scores until further notice. We will also notify a few related parties, such as NLP-progress.

jpilaul commented 4 years ago

Thanks for the heads up :) Can you perhaps explain where in your paper do you use the gradients from the Semsim score? From figure 2 of your paper, it seems that the generated summary and reference summary are fed to Semsim but I couldn't understand how the Semsim score gradient was used to update BART. I really appreciate your insights.

yuyan2do commented 4 years ago

Thanks @icml-2020-nlp for your response. I’m so sorry to hear the code history are not available in your current workplace. Hope everything gets better soon.

icml-2020-nlp / semsim

How reward loss used in training #6