memect / hao

好东西传送门
1.4k stars 463 forks source link

@Joyce-Yuan- 对于拼写错误(real-word error),英文里面有Wikipedia:List of commonly misused English words, 中文有没有类似的资料呢? #179

Closed haoawesome closed 9 years ago

haoawesome commented 9 years ago

http://www.weibo.com/1944220341/BmPct6gZD

http://en.wikipedia.org/wiki/Wikipedia:List_of_commonly_misused_English_words

haoawesome commented 9 years ago

概念

"Real-word spelling errors are words in a text that occur when a user mistakenly types a correctly spelled word when another was intended." screen shot 2014-09-13 at 9 49 40 pm

haoawesome commented 9 years ago

Chinese Spelling Check

http://research.microsoft.com/en-us/um/people/jfgao/project/csc.ppt Chinese Spelling Checking (or, the Big CSC) Jianfeng Gao (2002 ?) Microsoft Research Asia screen shot 2014-09-13 at 11 05 25 pm

https://sites.google.com/site/aclsighan7/bake-offs Bake-off 2013: Chinese Spelling Check

screen shot 2014-09-13 at 11 09 58 pm

screen shot 2014-09-13 at 9 53 24 pm

haoawesome commented 9 years ago

http://alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html Spelling Tutorial at LingPipe

haoawesome commented 9 years ago

http://saffron.insight-centre.org/acl_anlp/topic/correcting_word/ 相关论文 correcting word

haoawesome commented 9 years ago

http://www.dcs.bbk.ac.uk/research/recentphds/pedler.pdf Computer Correction of Real-word Spelling Errors in Dyslexic Text Jennifer Pedler 2007

haoawesome commented 9 years ago

(English?) Spelling Correction

http://norvig.com/spell-correct.html How to Write a Spelling Corrector Peter Novig (maybe 2007, discussed on reddit by that time ) screen shot 2014-09-13 at 11 02 22 pm

http://www.cs.toronto.edu/pub/gh/Hirst+Budanitsky-2005.pdf Correcting real-word spelling errors by restoring lexical cohesion Natural Language Engineering 11 (1): 87–111. 2005

http://www.aclweb.org/anthology/D09-1129 Aminul Islam and Diana Inkpen. 2009. Real-word spelling correction using Google web 1Tn-gram data set. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09) http://doi.acm.org/10.1145/1645953.1646205

http://www.slideshare.net/rbouskila/spell-checking-using-an-ngram-language-model Spell checking using an N-gram language model by Raphael Bouskila , Co-Founder & CTO at CoPower on Sep 24, 2013

haoawesome commented 9 years ago

http://www.aclweb.org/anthology/C14-1028 Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners (COOLING 2014)

"In the HSK corpus, which contains compositions of students from different countries who study Chinese in Beijing Language and Culture University ( http://nlp.blcu.edu.cn/online-systems/hsk-language-lib-indexing-system.html ) , there are 35,884 errors at sentence level. The top 10 error types and their occurrences are listed below: Word Ordering Errors (WOE) (8,515), Missing Component (Adverb) (3,244), Missing Component (Predicate) (3,018), Grammatical Error (“Is ... DE”) (2,629), Missing Component (Subject) (2,405), Missing Component (Head Noun) (2364), Grammatic al Error (“Is” sentence) (1,427), Redundant Component (Predicate) (1,130), Uncompleted Sentence (1,052), and Redundant Component (Adverb) (1,051). WOEs are the most frequent type of errors (Yu and Chen, 2012)"

haoawesome commented 9 years ago

http://www.aclweb.org/anthology/W13-4416 Graph Model for Chinese Spell Checking Zhongye Jia, Peilu Wang and Hai Zhao "spell checking in Chinese is very different from that in English or other alphabetical languages. In Bake- Off 2013, the evaluation includes two sub-tasks: detection and correction for Chinese spell errors."

http://www.lrec-conf.org/proceedings/lrec2012/pdf/727_Paper.pdf Spell Checking for Chinese Shaohua Yang, Hai Zhao, Xiaolin Wang, Bao-liang Lu

haoawesome commented 9 years ago

http://www.google.com/patents/US8725497 System and method for detecting and correcting mismatched Chinese character US 8725497 B2

haoawesome commented 9 years ago

问: @Joyce-Yuan- 对于拼写错误(real-word error) 求中文类似资料? 答: 详见 http://memect.co/wK9RFaN 拼写错误分non-word和real-word, 中英文难点不同。SIGHAN7的Bake-off 2013: Chinese Spelling Check 有很多论文(十月CLP14在武汉开), 英文spelling correction看Peter Novig 07年文章(21行python实现) http://www.weibo.com/5220650532/BmXdqD5Eh?ref=

haoawesome commented 9 years ago

https://groups.google.com/forum/#!topic/chinesemac/ttxc5ubwNIk 常见工具就是微软Office啦。

haoawesome commented 9 years ago

https://github.com/elasticsearch/elasticsearch/issues/3184

"I would look at this plugin: https://github.com/elasticsearch/elasticsearch-analysis-smartcn it brings the Lucene Smart Chinese analyzer into ES"