mocobeta / janome

Japanese morphological analysis engine written in pure Python
https://mocobeta.github.io/janome/en/
Apache License 2.0
855 stars 51 forks source link

How to register words containing ", (comma)" in the dictionary #96

Open nekomimimaiden257 opened 3 years ago

nekomimimaiden257 commented 3 years ago

en:

Is there a way to register words that contain "," (comma) in the user dictionary? For example, I want to register the following words.

In case of MeCab, you can register a word including comma by enclosing it in double quotation marks. In case of Janome, I get "ValueError: too many values to unpack" error as follows.

Traceback (most recent call last):
  File "janome_make_usrdic.py", line 86, in <module>
    user_dict = UserDictionary("janome_sample_dic.csv", "cp932", "ipadic", sysdi
c.connections)
  File "C:\Python27\lib\site-packages\janome\dic.py", line 393, in __init__
    compiledFST, entries = build_method(user_dict, enc)
  File "C:\Python27\lib\site-packages\janome\dic.py", line 405, in buildipadic
    line.split(',')
ValueError: too many values to unpack

As long as it is split by line.split(','), is it not possible to handle words containing commas?

ja(日本語):

ユーザー辞書に「,(カンマ)」を含む単語を登録する方法はありますか。 例えば、次のような単語を登録したいです。

MeCab の場合は、 ダブルクォーテーションで括ることで、カンマを含む単語を辞書登録できます。 Janome の場合は、以下のように「ValueError: too many values to unpack」エラーとなってしまいます。

Traceback (most recent call last):
  File "janome_make_usrdic.py", line 86, in <module>
    user_dict = UserDictionary("janome_sample_dic.csv", "cp932", "ipadic", sysdi
c.connections)
  File "C:\Python27\lib\site-packages\janome\dic.py", line 393, in __init__
    compiledFST, entries = build_method(user_dict, enc)
  File "C:\Python27\lib\site-packages\janome\dic.py", line 405, in buildipadic
    line.split(',')
ValueError: too many values to unpack

line.split(',') で分割している以上、カンマを含む単語は扱えない仕様でしょうか。

Environment: Janome 0.3.10 python 2.7.18 32bit Windows 8.1 64bit