tshatrov / ichiran

Linguistic tools for texts in Japanese language
MIT License
285 stars 30 forks source link

Clarification of settings.lisp #11

Closed paulm17 closed 3 years ago

paulm17 commented 3 years ago

I'm a bit confused on the entries:

(defparameter *jmdict-path* #p"/home/you/dump/JMdict_e")

I found JMdict_e here: http://ftp.monash.edu/pub/nihongo/JMdict_e.gz

Is this the file?

Also

(defparameter *jmdict-data* #p"/home/you/dump/jmdict-data/")

Where is jmdict-data? Is it supposed to be jmdictdb/data?

Thanks

tshatrov commented 3 years ago

JMdict_e is the correct file, yes. This is only needed when generating ichiran database, normally you can just download the latest dump from the releases page and this parameter wouldn't be used. Because JMdict is updated daily, I can't guarantee that the latest version of JMdict works with the current code of ichiran, so using database dumps is preferable.

*jmdict-data* is located here https://gitlab.com/yamagoya/jmdictdb/-/tree/master/jmdictdb/data Currently used files are kwpos.csv, conj.csv and conjo.csv. Since they use GPL license, I can't include these files in my repository. I'll update the link to the correct directory in README. It was at /pg/data before, but they seem to have moved it at some point and the link got outdated, so thanks for notifying me.

paulm17 commented 3 years ago

Many thanks for the clarification.

paulm17 commented 3 years ago

I'm sorry for bugging you again.

I've managed to get it kind of working. So ignore the email I sent you.

0[2] (ichiran:romanize "一覧は最高だぞ" :with-info t)

debugger invoked on a TYPE-ERROR in thread
#<THREAD "main thread" RUNNING {100194E643}>:
  The value
    NIL
  is not of type
    HASH-TABLE
  when binding HASH-TABLE

I'm also confused on JMDict. Do I just need to clone it or do I need to also create it as per:

make -f Makefile-db jmnew

Repeat the following, where "loadXX" is one of loadjm (JMdict), loadne (JMnedict), loadkd (kanjidic2), loadex (Tatoeba examples) for as many of the those sources as you want to load.

As I have two databases, jmdict and your one. Does ichiran use both or ichiran?

I'm not a lisp user so sorry for the newbie questions.

Thanks

tshatrov commented 3 years ago

ichiran only uses its own postgres database. JMdict_e is a xls file that's used to load the data. It is not necessary to install any of jmdict software from their repository.