leoncamel / mecab

Automatically exported from code.google.com/p/mecab
0 stars 0 forks source link

Kanji do not appear. #2

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Installed MeCab and Python binding.
2. Tried example code from 
http://mecab.googlecode.com/svn/trunk/mecab/doc/bindings.html
3. See result below.

What is the expected output? What do you see instead?
Basic issue is that no Kanji appear.

>>> print m.parse ("今おはよう。")
?   ??????? ?   ̾??-????       
?お?   ?お?   ?お?   ????-????       
??  ??  ??  ̾??-??ͭ̾??-?ȿ?      
??う。    ??う。    ??う。    ????-????       
EOS

What version of the product are you using? On what operating system?
Mecab .994 on OS X Mountain Lion with Python 2.7

Please provide any additional information below.
email: alexleavitt@gmail.com

Original issue reported on code.google.com by alexleav...@gmail.com on 26 Aug 2012 at 7:09

GoogleCodeExporter commented 9 years ago
Hey Alex,

Another user here! I imagine this problem is long gone for you by now but here 
are my two cents.

I've had a problem like this in the past. For me it was just a text encoding 
issue: my locale was utf-8 and the MeCab dictionary I had installed was euc-jp.

My solution might work for you:

$ python
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import MeCab
>>> m = MeCab.Tagger()
>>> encoding = m.dictionary_info().charset
>>> input_bytes = u"今おはよう。".encode(encoding)
>>> result_bytes = m.parse(input_bytes)
>>> print result_bytes.decode(encoding)
今      名詞,副詞可能,*,*,*,*,今,イマ,イマ
おはよう        感動詞,*,*,*,*,*,おはよう,オハヨウ,オハヨー
。      記号,句点,*,*,*,*,。,。,。
EOS

Original comment by richard....@gmail.com on 18 Mar 2013 at 1:24