howardyclo / papernotes

My personal notes and surveys on DL, CV and NLP papers.
128 stars 6 forks source link

Reaching Human-Level Performance in Automatic Grammatical Error Correction: An Empirical Study #22

Open howardyclo opened 5 years ago

howardyclo commented 5 years ago

Metadata

howardyclo commented 5 years ago

Summary

Figure 2

Flaws of Seq2Seq Model (Motivation of This Paper)

Figure 1

Fluency Boost Learning

Figure 3

Back-Boost Learning

Algorithm 1

Self-Boost Learning

Algorithm 2

Dual-Boost Learning

Algorithm 3

The error correction model and the error generation model are dual and both of them are dynamically updated, which improves each other: the disfluency candidates produced by error generation model can benefit training the error correction model, while the disfluency candidates created by error correction model can be used as training data for the error generation model.

Fluency Boost Learning with Large-Scale Native Data

The purposed strategies can be easily extended to utilize massive native text data, where we can additionally increase correct-correct sentence pairs from native data.

Fluency Boost Infenrece

Experiments

Dataset

Experiment setting

Experiment Results

(Note: See tables in the paper, we only provide analysis here)

Contributions

Unlike the models trained only with original error-corrected data, we propose a novel fluency boost learning mechanism for dynamic data augmentation along with training for GEC, despite some related studies that explore artificial error generation for GEC.

We propose fluency boost inference which allows the model to repeatedly edit a sentence as long as the sentence’s fluency can be improved. To the best of our knowledge, it is the first to conduct multi-round seq2seq inference for GEC, while similar ideas have been proposed for NMT (Xia et al., 2017).

Related Work