This repository includes the code and data used in our paper titled "Generating Inflectional Errors for Grammatical Error Correction in Hindi". If you are interested in using our work, please cite
@inproceedings{sonawane-etal-2020-generating,
title = "Generating Inflectional Errors for Grammatical Error Correction in {H}indi",
author = "Sonawane, Ankur and
Vishwakarma, Sujeet Kumar and
Srivastava, Bhavana and
Kumar Singh, Anil",
booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop",
month = dec,
year = "2020",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.aacl-srw.24",
pages = "165--171",
}
A. Wikiextract -- Create a dataset of artificial Hindi errors (IPython Notebook | Repository)
Data: Artificial Dataset used as Train Dataset (src file | trg file)
B. Wikiedits -- Extract a dataset of real hindi errors from Wikipedia (IPython Notebook | Repository)
Data: Natural Dataset used as Test Dataset (edits file | src file | trg file)
C. MLConvGEC -- multilayer convolutional encoder decoder Model (IPython Notebook)
D. Tensor2Tensor -- Base Transformer Model (IPython Notebook)
E. fairseq-gec -- Copy Augmented Transformer Model (IPython Notebook | Repository)
F. ERRANT -- Error classification for Hindi as well as model evaluation (IPython Notebook | Repository)
Data: Test Data segregated according to error classes src, trg and m2 files (Note .trg and .tgt files are identical)