7.10——GEC初步了解与文献阅读 - Githubissues

li-aolong / li-aolong.github.io

李傲龍的博客

https://aolong.me

82 stars 16 forks source link

7.10——GEC初步了解与文献阅读 #1

Open li-aolong opened 5 years ago

li-aolong commented 5 years ago

一、了解语法错误纠正(Grammatical Error Correction, GEC)的相关方法

基于规则的方法
数据驱动的传统机器学习方法
基于机器翻译的方法

a. 统计机器翻译(Statistical Machine Translation，SMT)

b. 神经机器翻译(Neural machine translation，NMT)

i. encoder-decoder模型

ii. 注意力机制(attention mechanism)

iii. 重排序(rerank)

二、了解GEC的相关数据库

NUCLE a. https://www.comp.nus.edu.sg/~nlp/corpora.html（官网，不方便下载）

b. https://github.com/KentonMurray/Non-nativeEnglishGrammarCorrection（2.2版本）
Lang-8

a. https://sites.google.com/site/naistlang8corpora/（可能非官方）
JFLEG

a. https://github.com/keisks/jfleg（官方提供）
其它

a. http://grammatical.github.io/resources/

b. https://github.com/snukky/wikiedits

三、阅读了两篇GEC相关文献

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

该文章混合了SMT和NMT两种方法来进行GEC处理，得到的混合系统在CoNLL-2014的M2指标和JFLEG的GLEU指标下分别取得了50.19和56.74的结果，优于前人结果。
Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

该文章使用全新的基于流畅度增强学习和推断机制(fluency boost learning and inference mechanism)的seq2seq模型来进行GEC处理，在CoNLL-2014的M2F0.5指标和JFLEG的GLEU指标下分别取得了75.72和62.42的结果，在这两个基础上首次超越人类。