Closed redguy666 closed 11 years ago
Hi,
W dniu 2013-03-15 11:45, Maciej Lizewski pisze:
Hi, I tried to build dictionary following provided information (dowloaded all external dictionary files, etc), but the result is always different from provided polish dictionary and does not work with solr.
What are the differences? Can you send a diff?
Also - when I tried to use fsa scripts with default pl.dict from standard jar - I get errors:
>fsa_guess -d pl.dict
Invalid dictionary version in file: pl.dict Version number is -58 which indicates dictionary was build: with yet unknown compile options (upgrade your software)
The problem is that you need to use the flags = fsa5 in the Makefile. I will change the target in the Makefile to split more between cfsa2 and fsa5 formats.
Also, you need to use -I with fsa_guess.
Best, Marcin
CFSA2 should be fine if you plan to use it in Solr since Solr uses Java version of Morfologik (which supports CFSA2). Your problem is somewhere else. Provide exact reproduction steps -- how you compile the dictionary, what are the input files, etc.
I changed the Makefile as well because it was slightly wrong. Please use the new one. The target pl.dict is fine for Java, polish.dict is for fsa_morph.
By the way, fsa_guess is NOT suitable for morphological dictionaries. Only fsa_morph is.
ok.. one thing is now clear - it seems I had old script sources (downloaded them from other location than this git repository). This one uses java application to create dictionary :)
Anyway - downloaded current sources from git, odm.txt, polish.all, pl_PL.aff, converted them to utf-8 (as it was in readme_pl.txt), but when trying to build with "make pl.dict" I get error about missing "eksport.tab" file needed to build polimorfologik.txt. Further look at makefile and there are more files missing: join_tags.awk and version_script.awk (first one is also needed to build polimorfologik.txt) where can I find those 3 files?
odm.txt, polish.all, pl_PL.aff are not used at all right now; the only source file is eksport.tab but you only need polimorfologik.txt. Basically, this is just a sorted version of the file plus a small addition of the brev*.txt file, and it is huge. So it's easier to host it at sourceforge: simply download morfologik.zip and use polimorfologik.txt for further work. I added the missing scripts right now.
thanks for your help! everything seem to work now :)
Just curious -- what was the reason you needed a custom built of the dictionary?
make script does not work with newest morfologik tools jar
Hi,
sourceforge stopped hosting the morfologik files, where can the polimorfologik.txt be downloaded from right now?
Please do not attach comments unrelated to the issue. I've created a new one for you, here: https://github.com/morfologik/morfologik-scripts/issues/3
Hi, I tried to build dictionary following provided information (dowloaded all external dictionary files, etc), but the result is always different from provided polish dictionary and does not work with solr. Also - when I tried to use fsa scripts with default pl.dict from standard jar - I get errors:
>fsa_guess -d pl.dict
Invalid dictionary version in file: pl.dict Version number is -58 which indicates dictionary was build: with yet unknown compile options (upgrade your software)