marzenakrp / demetr

Repository for DEMETR: Diagnosing Evaluation Metrics for Translation
MIT License
15 stars 2 forks source link

Possible typo in negation file #2

Open danieldeutsch opened 1 year ago

danieldeutsch commented 1 year ago

Hello! Thanks for the great dataset.

I stumbled across this data point, and I suspect that it has a typo. It looks like the perturbed translation has annotator instructions appended to it.

https://github.com/marzenakrp/demetr/blob/9b31cfe92b2fee26d3d7532aea349af31a63f93e/dataset/critical_id8_negation.json#L4053

Is this a mistake? Thanks!

danieldeutsch commented 1 year ago

This line has an extra "\n." added to it:

https://github.com/marzenakrp/demetr/blob/9b31cfe92b2fee26d3d7532aea349af31a63f93e/dataset/critical_id8_negation.json#L8365

I found these because I am writing a preprocessing script that converts from json to tsv. The newline characters split 1 example over multiple lines and breaks the tsv loading.