code for scoring consistency test set using DocRepair model?

lena-voita / good-translation-wrong-in-context

This is a repository with the data and code for the ACL 2019 paper "When a Good Translation is Wrong in Context: ..." and the EMNLP 2019 paper "Context-Aware Monolingual Repair for Neural Machine Translation"

97 stars 18 forks source link

code for scoring consistency test set using DocRepair model? #8

Closed xiaoyi0814 closed 4 years ago

xiaoyi0814 commented 4 years ago

Dear authors, Thank you for publishing code on Github. I am trying to reproduce the consistency score using DocRepair model. However, I didn't sure how to produce scores for DocRepair model.

I just wanted to know whether I should remove the _eos in the DocRepair input and output to get the evaluate losses, and then calculate the consistency scores.

Btw, could you publish the code for scoring consistency test set using DocRepair model?

lena-voita commented 4 years ago

No, you shouldn't remove the _eos tokens from the data - this will hurt the model, because it was trained with them.

We haven't published code for scoring the DocRepair model, because the model is the standard Transformer - all that needs to be changed is the data. For the DocRepair, source for a given fragment is the _eos-separated baseline sentence-level translations of the corresponding source sentences from the contrastive test set, target is _eos-separated target sentences .