Word Segmentation for Japanese

ghost commented 13 years ago

In order to be able to lay out Japanese text we first need to find word boundaries, for this we need an external tool.

ghost commented 13 years ago

These are the ones I have found:

ChaSen
MeCab

@spyysalo: I could not find anything for Juman (I assume it stands for: Japanese Morphological Analysis System something something)

spyysalo commented 13 years ago

http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman-e.html

ghost commented 13 years ago

Support for MeCab in @3e12c862563cd2343e1c.

ghost commented 13 years ago

Implemented and sent to the client, closing.

sunmoonStern commented 8 years ago

Hello, does brat work with MeCab? My tools.conf looks like this and I am using brat for annotation for some time now but I don't think brat is recognizing different words/tokens. MeCab can be used to generate output in several different formats. Could you tell me what kind of output is brat assuming? Thanks in advance.

[options]

# Possible values for validate:
Validation  validate:all

# Possible values for tokenizer
Tokens tokenizer:mecab
#Tokens    tokenizer:whitespace

# Possible values for splitter:
Sentences   splitter:regex

# Possible values for logfile:
Annotation-log logfile:<NONE>

[search]
Google       <URL>:http://www.google.com/search?q=%s
Wikipedia    <URL>:http://en.wikipedia.org/wiki/Special:Search?search=%s

[annotators]

[disambiguators]

[normalization]

Actually, I'm not sure if it's a good idea to comment on a closed issue, so I'll open a new issue for this problem I am having.

nlplab / brat

Word Segmentation for Japanese #164