grammatical / neural-naacl2018

Neural models and instructions on how to reproduce our results for our neural grammatical error correction systems from M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, K. Heafield: Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task, NAACL 2018.
MIT License
88 stars 15 forks source link

Question regarding oversampling #9

Open gurunathparasaram opened 5 years ago

gurunathparasaram commented 5 years ago

Thanks!

snukky commented 5 years ago

I don't think oversampling would downgrade the scores (but I haven't run these models on the BEA datasets yet). Such a low precision and high recall may suggest that there is something wrong with pre/postprocessing. How did you pre/postprocess the data? CoNLL uses NLTK and BEA uses Spacy for tokenization. Maybe they differ too much. Did you take a look at corrections made by the system?

Another thing might be that the weight for LM is too high. It was grid-searched on CoNLL 2013.

We don't use re-ranking in this system. We only ensemble with a language model.

gurunathparasaram commented 5 years ago

I performed spell-correction using Jamspell on the BEA source sentences before giving them to the models. Will take a look into the system outputs soon and also try decreasing the weightage for LM. Sorry for the confusion, should have been ensemble+LM instead of reranking in my previous comment.