Differences between test.dst of Context-aware dataset and Docrepair dataset

Dear authors,

Thank you for publishing your code and data. It was organized well and convenient to follow. 👍

I have trained a sentence-level Transformer using context-agnostic training data and successfully reproduced the BLEU score (33.91 in emnlp2019 paper) on context-aware test set (remove bpe and '_eos', lowercase, 4 segments as a long sentence).

But I found that the "test.dst" in Docrepair dataset is different with "test.ru" in Contest-aware dataset.

The first line in "test.dst" in Docrepair dataset:

вчера ночью кто-то вломился в мой дом и украл эту урод `скую футболку . _eos да ... _eos я не верю в это . _eos она слишком свободная на мне , чувак .

The first line in "test.ru" in Contest-aware dataset:

Вчера ночью кто-то вломился в мой дом и украл эту уродскую футболку . _eos Да ... _eos Я не верю в это . _eos Она слишком свободная на мне , чувак .

Except for lowercasing, "test.dst" in Docrepair dataset has many " `" splitting some token (e.g., "уродскую" in the first line).

I want to know that:

Which reference is correct?
Does Docrepair dataset have different tokenization with context-aware dataset?

Looking forward to your reply. :)

lena-voita / good-translation-wrong-in-context

Differences between test.dst of Context-aware dataset and Docrepair dataset #9