s-ankur / hindi_grammar_correction

Hindi Grammar Correction
3 stars 2 forks source link

Grammar Correction For Hindi Using Neural Network (Deep Learning)

This repository includes the code and data used in our paper titled "Generating Inflectional Errors for Grammatical Error Correction in Hindi". If you are interested in using our work, please cite

@inproceedings{sonawane-etal-2020-generating,
    title = "Generating Inflectional Errors for Grammatical Error Correction in {H}indi",
    author = "Sonawane, Ankur  and
      Vishwakarma, Sujeet Kumar  and
      Srivastava, Bhavana  and
      Kumar Singh, Anil",
    booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop",
    month = dec,
    year = "2020",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.aacl-srw.24",
    pages = "165--171",
}

Code and Data

A. Wikiextract -- Create a dataset of artificial Hindi errors (IPython Notebook | Repository)

Data: Artificial Dataset used as Train Dataset (src file | trg file)

B. Wikiedits -- Extract a dataset of real hindi errors from Wikipedia (IPython Notebook | Repository)

Data: Natural Dataset used as Test Dataset (edits file | src file | trg file)

C. MLConvGEC -- multilayer convolutional encoder decoder Model (IPython Notebook)

D. Tensor2Tensor -- Base Transformer Model (IPython Notebook)

E. fairseq-gec -- Copy Augmented Transformer Model (IPython Notebook | Repository)

F. ERRANT -- Error classification for Hindi as well as model evaluation (IPython Notebook | Repository)

Data: Test Data segregated according to error classes src, trg and m2 files (Note .trg and .tgt files are identical)