Question regarding oversampling

grammatical / neural-naacl2018

Neural models and instructions on how to reproduce our results for our neural grammatical error correction systems from M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, K. Heafield: Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task, NAACL 2018.

MIT License

88 stars 15 forks source link

Question regarding oversampling #9

Open gurunathparasaram opened 5 years ago

gurunathparasaram commented 5 years ago

In the Low-resource paper, you have mentioned that NUCLE was over-sampled 10 times for domain-adaptation for CoNLL-14 dataset.
I tried benchmarking the pre-trained models provided in this repo on the WI+LOCNESS test-set.
Single model gave an F-score of 34.15 whereas the ensemble of 4 models+reranking gave an F-score of 53.27. The ensemble gives fewer false positives than the single model leading to higher precision.
Metrics of single model on WI+LOCNESS test-set
Metrics of ensemble on WI+LOCNESS test-set
Does oversampling on NUCLE data lead to a decrease in precision for the single model from 69-70 on CONLL-14 test set to 31.3 on WI+LOCNESS test-set?

Thanks!

snukky commented 5 years ago

I don't think oversampling would downgrade the scores (but I haven't run these models on the BEA datasets yet). Such a low precision and high recall may suggest that there is something wrong with pre/postprocessing. How did you pre/postprocess the data? CoNLL uses NLTK and BEA uses Spacy for tokenization. Maybe they differ too much. Did you take a look at corrections made by the system?

Another thing might be that the weight for LM is too high. It was grid-searched on CoNLL 2013.

We don't use re-ranking in this system. We only ensemble with a language model.

gurunathparasaram commented 5 years ago

I performed spell-correction using Jamspell on the BEA source sentences before giving them to the models. Will take a look into the system outputs soon and also try decreasing the weightage for LM. Sorry for the confusion, should have been ensemble+LM instead of reranking in my previous comment.