rewicks / ersatz

Apache License 2.0
39 stars 5 forks source link

Public Evaluation Data #9

Closed bminixhofer closed 1 year ago

bminixhofer commented 1 year ago

Is the evaluation data available / can you make it available? Sorry if it already is and I missed it.

Specifically referring to this part in the paper:

We construct test sets from the WMT News Translation test sets (Barrault et al., 2020), which provides for decent-size test sets in many languages. We manually corrected all sentence segmentations. While some sets were already well segmented, some more recent years were extremely under-segmented.

Having the evaluation data as used in your paper would make it much easier to build on your work. Thanks!

bminixhofer commented 1 year ago

Ok, nevermind, I found it :) For future reference, the data is available here: