grammatical / pretraining-bea2019

Models, system configurations and outputs of our winning GEC systems in the BEA 2019 shared task described in R. Grundkiewicz, M. Junczys-Dowmunt, K. Heafield: Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data, BEA 2019.
MIT License
50 stars 8 forks source link

Can you provide us the data use in the experiments? #6

Closed Lavine24 closed 3 years ago

Lavine24 commented 5 years ago

That's an awesome work, Thanks for sharing this code, Can you please give us the data you describe in the README? Thank you.

sappy5678 commented 3 years ago

Do you have any plan to release your synthetic data?

snukky commented 3 years ago

Hi, The synthetic part of data is available from: http://data.statmt.org/romang/gec-bea19/synthetic/ A better/newer version of the data (noise applied before subword splitting) can be found here: http://data.statmt.org/romang/gec-wnut19/data.en.tgz The complete parallel data I've been sharing via email due to licensing, so if you need it, please drop an email. Best