Closed xiaoyi0814 closed 4 years ago
No, you shouldn't remove the _eos tokens from the data - this will hurt the model, because it was trained with them.
We haven't published code for scoring the DocRepair model, because the model is the standard Transformer - all that needs to be changed is the data. For the DocRepair, source for a given fragment is the _eos-separated baseline sentence-level translations of the corresponding source sentences from the contrastive test set, target is _eos-separated target sentences .
Dear authors, Thank you for publishing code on Github. I am trying to reproduce the consistency score using DocRepair model. However, I didn't sure how to produce scores for DocRepair model.
I just wanted to know whether I should remove the _eos in the DocRepair input and output to get the evaluate losses, and then calculate the consistency scores.
Btw, could you publish the code for scoring consistency test set using DocRepair model?