Closed haoawesome closed 10 years ago
http://ntz-develop.blogspot.com/2011/03/phonetic-algorithms.html
http://saffron.insight-centre.org/acl/topic/phonetic_similarity/ Phonetic algorithms
https://homes.cs.washington.edu/~bhixon/papers/phonemic_similarity_metrics_Interspeech_2011.pdf Phonemic Similarity Metrics to Compare Pronunciation Methods (2011)
http://webdocs.cs.ualberta.ca/~kondrak/papers/lingdist.pdf Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification (2006)
http://webdocs.cs.ualberta.ca/~kondrak/papers/chum.pdf Phonetic alignment and similarity (2003)
http://www.aclweb.org/anthology/P/P06/P06-1125.pdf A Phonetic-Based Approach to Chinese Chat Text Normalization 中文方法
Soundex Daitch–Mokotoff Soundex Kölner Phonetik Metaphone - Double Metaphone New York State Identification and Intelligence System Match Rating Approach (MRA) Caverphone
https://github.com/elasticsearch/elasticsearch-analysis-phonetic/ -- java https://github.com/maros/Text-Phonetic -- perl https://github.com/dotcypress/phonetics -- go https://github.com/lukelex/soundcord -- ruby https://github.com/Simmetrics/simmetrics -- java https://github.com/oubiwann/metaphone - https://pypi.python.org/pypi/Metaphone/0.4 --python https://bitbucket.org/yougov/fuzzy - https://pypi.python.org/pypi/Fuzzy/1.0 --python https://github.com/sunlightlabs/jellyfish - https://pypi.python.org/pypi/jellyfish/0.3.2 -- python https://github.com/rockymadden/stringmetric - scala https://github.com/Yomguithereal/clj-fuzzy - Clojure https://github.com/NaturalNode/natural - Node javascript
source: wikipedia, github
http://en.wikipedia.org/wiki/Homonym In linguistics, a homonym is, in the strict sense, one of a group of words that share the same pronunciation but may have different meanings.
http://en.wikipedia.org/wiki/Soundex Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English.
http://stackoverflow.com/questions/17010516/how-to-detect-how-similar-a-speech-recording-is-to-another-speech-recording How to detect how similar a speech recording is to another speech recording?
http://csl.ira.uka.de/fileadmin/Vorlesungen/WS2010-11/ATSP/presentations/IvayloJanev_HomophonesInASR.pdf How to solve homophone problems in Automatic Speech Recognition?
http://web.stanford.edu/class/cs124/lec/sem Word Meaning and Similarity Word Senses and Word Relations
https://github.com/lukelex/soundcord A phonetic algorithm to make comparison by phonetically similar terms easier.
http://www.psy.ntu.edu.tw/jtwu/jtwu/publish/%E6%9C%9F%E5%88%8A%E8%AB%96%E6%96%87/Chen%20Vaid%20&%20Wu%202009%20LCP%20Homophone%20Density.pdf Chen, Hsin-Chin, Vaid, Jyotsna and Wu, Jei-Tun(2009)'Homophone density and phonological frequency in Chinese word recognition',Language and Cognitive Processes,24:7,967 — 982
http://dl.acm.org/citation.cfm?id=1282081 A phonetic similarity model for automatic extraction of transliteration pairs 2007
https://twpl.library.utoronto.ca/index.php/twpl/article/download/6196/3185 Phonetic similarity and phonemic contrast in loanword adaptation Kevin Heffernan
http://www.aclweb.org/anthology/O00-1005 反向異文字音譯相似度評量方法與跨語言資訊檢索 (2000)
http://www.eejournal.ktu.lt/index.php/elt/article/viewFile/2628/1917 Predicting the Acoustic Confusability between Words for a Speech Recognition System using Levenshtein Distance
http://www.let.rug.nl/alfa/ling-distances/advertisement.html Workshop on Linguistic Distances, 2006
http://spraakbanken.gu.se/eng/research/digital-areal-linguistics/workshop-october-2011/program/program Workshop on comparing approaches to measuring linguistic differences 24-25 October 2011, University of Gothenburg
http://webdocs.cs.ualberta.ca/~kondrak Greg Kondrak Associate Professor
Department of Computing Science Athabasca Hall 221 University of Alberta Edmonton, Alberta, T6G 2E8 Canada
http://en.wikipedia.org/wiki/Phonetic_algorithm below are copied from wikipedia
A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result. They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages. Among the best-known phonetic algorithms are:
《语音相似度算法与代码:第一版》 作者:好东西传送门 编号:hao-2014-002 时间:2014-09-11
phonetic similarity algorithm Soundex Daitch–Mokotoff Soundex Kölner Phonetik Metaphone - Double Metaphone New York State Identification and Intelligence System Match Rating Approach (MRA) Caverphone
implementations https://github.com/elasticsearch/elasticsearch-analysis-phonetic/ -- java https://github.com/maros/Text-Phonetic -- perl https://github.com/dotcypress/phonetics -- go https://github.com/lukelex/soundcord -- ruby https://github.com/Simmetrics/simmetrics -- java https://github.com/oubiwann/metaphone - https://pypi.python.org/pypi/Metaphone/0.4 --python https://bitbucket.org/yougov/fuzzy - https://pypi.python.org/pypi/Fuzzy/1.0 --python https://github.com/sunlightlabs/jellyfish - https://pypi.python.org/pypi/jellyfish/0.3.2 -- python
问:@付超群 不知道有没有中文发音相似度计算算法或者类库?比如北京 百斤 鼻颈 背景 如果可以顺道比较英文更好,比如peking,beking 答: 关于算法和开源代码整理了一个 #脑图#,问答进展和相关资料在 http://memect.co/TL85MEp 还收录了一些相关论文(含汉语) 欢迎指正补充 http://www.weibo.com/5220650532/BmsMAeh0K?ref=
http://www.phon.ox.ac.uk/jcoleman/PHONOLOGY1.htm Phonetics vs. Phonology
常见的语音算法phonetic algorithm就是设定一组规则,将文字映射到某种音标符号系统。例如最原始的Soundex算法 扔掉所有元音,映射 b, f, p, v → 1 然后通过比较映射后符号串的差异来计算发音相似度。原帖中的脑图列举了常见英语(及德语)映射算法以及相关开源代码(python, java, go, ruby, perl) http://www.weibo.com/5220650532/BmLqi92Vx?mod=weibotime
这个资源不错哦,现在我正好有个东西不知道怎么做呢?
请问有找到比较“北京”和“Peking”发音相似性的方法吗?
推荐一下 https://yomguithereal.github.io/talisman/phonetics/ 里面提供了16种语音相似性算法。 我的 hallelujahIM 输入法使用了其中的 phonex 算法来实现英语语音模糊匹配功能。
私信