taku910 / mecab

Yet another Japanese morphological analyzer
923 stars 211 forks source link

Mecab algorithm (Mecabアルゴリズム) #52

Open jtsoftware opened 5 years ago

jtsoftware commented 5 years ago

Mecabアルゴリズムを説明した文書はどこかにありますか?

それとも誰かが簡単な一段落の説明を与えることができますか?

私は自分のウェブサイトや電話のアプリで言語を教えるためにこの機能が必要です (www.jtlanguage.com)。 他の言語にも一般化したい。 ライセンスの問題なくそれが必要です。したがって、私は自分自身のC#実装を作成したいと思います。

ありがとうございました。

Is there a document somewhere that describes the Mecab algorithm?

Or could someone give a simple one-paragraph description?

I need this functionality in my website and phone apps for teaching languages (www.jtlanguage.com). I want to generalize it for other languages also. I need it without license problems. Therefore I want to create my own C# implementation.

Thank you.

polm commented 4 years ago

The algorithm is described in this paper, "Applying Conditional Random Fields to Japanese Morphological Analysis", though you'll probably have to look at the code with details.

The basic idea is it builds a lattice and then uses the Viterbi algorithm to find the cheapest path through the lattice. The tricky part is handling unknown words, which have dictionary entries (and costs) generated on the fly based on number and type of characters.