Our new model achieves a BLEU score of 74%. We tested the model on a test set of 24 one changes, and it completely fixes 7 of them (~30%). I've annotated the translation with G and NG. G for good (completely translates correctly) and NG for not good (it always returns the input exactly with no changes).
Sending you test set on Skype because Github doesn't allow the file type.
Hi Vincent,
Our new model achieves a BLEU score of 74%. We tested the model on a test set of 24 one changes, and it completely fixes 7 of them (~30%). I've annotated the translation with G and NG. G for good (completely translates correctly) and NG for not good (it always returns the input exactly with no changes).
Sending you test set on Skype because Github doesn't allow the file type.