idumiY / lucene-gosen

Automatically exported from code.google.com/p/lucene-gosen
0 stars 0 forks source link

Dictionary builder for UniDic #33

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
It seems that lucene-gosen doesn't support UniDic dictionary.  Although it 
cannot bundle with lucene-gosen due to the license issue, I would like to build 
a dictionary and use it as a main dictionary. Do you have any plan to provide 
the dictionary builder or tool for UniDic dictionary? 

Original issue reported on code.google.com by kazuaki....@gmail.com on 8 Jun 2012 at 3:12

GoogleCodeExporter commented 8 years ago

Original comment by johtani on 11 Jun 2012 at 8:30

GoogleCodeExporter commented 8 years ago
Create first patch.

1. Run following command to apply patch
  cd lucene-gosen-trunk-readonly
  patch -p0 < lucene-gosen-unidic.patch
2. create dictionary directory
  mkdir dictionary/unidic

3. Download Unidic src tar.gz file, and copy to "dictionary/unidic/"

4. Build lucene-gosen
   ant -Ddictype=unidic

Original comment by johtani on 18 Jun 2012 at 9:14

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by johtani on 18 Jun 2012 at 9:14

GoogleCodeExporter commented 8 years ago
This patch have two limitations.

1. The COMPOUND entry of dictionary is not support.
   COMPOUND entry include only in Noun.common.dic file, and only one entry.
   lucene-gosen dictionary.csv is not support Compound token.
   This patch is skiped this entry.

2. Empty pronunciation for readings in 5 entry.
   ex. "ミンサー" has two variation of readings. But pronunciation is one.
   Currently, lucene-gosen is not support readings.size() > pronunciation.size(). 
   This patch return following lists.
    readigns[ミンサー, ミンサア]
    pronunciations[ミンサー, ]

Original comment by johtani on 18 Jun 2012 at 9:34

GoogleCodeExporter commented 8 years ago
Thank you Ohtani-san,

I succeeded to apply patch. I can start evaluation of UniDic and comparison 
with IPA Dictionary.

Original comment by kazuaki....@gmail.com on 19 Jun 2012 at 1:38