openvenues / jpostal

Java/JNI bindings to libpostal for for fast international street address parsing/normalization
MIT License
105 stars 42 forks source link

Parser/expander not loaded #8

Closed gianvi closed 8 years ago

gianvi commented 8 years ago

Hi are u familiar with this issue? It's happen when I try to expand an address string with and without lang options.... (I'm using scala)

ERR language_classifier not loaded, run libpostal_setup_language_classifier() at classify_languages (language_classifier.c:63) errno: No such file or directory ERR parser is not setup, call libpostal_setup_address_parser() at address_parser_parse (address_parser.c:808) errno: No such file or directory ERR Parser returned NULL at parse_address (libpostal.c:1046) errno: No such file or directory

albarrentine commented 8 years ago

Did you move the libpostal data directory or was there an error when downloading? Try using the libpostal command-line clients (found in the src directory of the libpostal checkout) and see if those are working. If they aren't, try reinstalling the libpostal C library and make sure there's about 2.2G in the data dir.

gianvi commented 8 years ago

Hi, and sorry for stress, but I'm gonna crazy: I've tried to reinstall everything such as follow:

:compileJava UP-TO-DATE :processResources UP-TO-DATE :classes UP-TO-DATE :compileTestJava UP-TO-DATE :processTestResources UP-TO-DATE :testClasses UP-TO-DATE :test ERR Error loading transliteration module, LIBPOSTAL_DATA_DIR=/home/ee39708/Scrivania/new_libpostal/libpostal at libpostal_setup (libpostal.c:1059) errno: No such file or directory :test FAILED

FAILURE: Build failed with an exception.

Actually I was expecting to find in my new datadir (./configure --datadir=/home/ee39708/Scrivania/new_libpostal/libpostal) to find all the dictionaries and db, but I've just found the "last_updates" files for a few bytes.

Can u suggest something? Thanks in advance

albarrentine commented 8 years ago

Ok, can you delete all your datadirs and reinstall libpostal? Or just move the the previous datadir to the new location with all of the files.

gianvi commented 8 years ago

Sorry but I'm not a very ubuntu expert...and with sudo apt-get remove libpostal I got:

albarrentine commented 8 years ago

Ah, no, libpostal's not in apt-get so just rm -rf $OLD_DATA_DIR, rm -rf $NEW_DATA_DIR, then run the setup instructions for the C library as usual.

gianvi commented 8 years ago

At the end I moved the whole project to my Mac...and re-runned all stuff. I ended up with all data downloaded, so that all tests are ok ( ./gradlew check ) Now is time to add the native libraries, so correct me if I am wrong:

jnilibs

albarrentine commented 8 years ago

Yes, src/main/jniLibs is the correct location and those are the files that should be in there (for Mac anyway, on *nix it's .so).

gianvi commented 8 years ago

Actually was not .so! But now everuthing works! Tnx

gianvi commented 8 years ago

Wait u to close this! And if want (and have time...I really would like to speak with u, maybe in a chat, mail or where u want to discuss about this) ...the today example is...

albarrentine commented 8 years ago

Sankt moritz is correct though, no?

expand_address uses a language classifier, which is logistic regression trained on 4-grams of address strings similar to that one (other language classifiers like Chrome's cld are usually trained on Web documents). The classifier predicts a probability distribution over languages, and any languages which have a probability >= a threshold - .05 seems to work well - are chosen as potential languages for the address, and those languages' dictionaries are applied in expansion. In this way, the correct expansion is at least likely to be contained in the set of possibilities (libpostal's output for expand_address is supposed to be treated as a set, not a ranked list).

For the given address, the highest probability languages were:

en (0.830387)
de (0.163512)

If the street name were St Moritzstrasse it will predict German with a probability > 99.98%, but as written, the above is a reasonable prediction. Glancing at the string without context (i.e. knowing that it's in Switzerland), it looks like it's probably German, but some of those 4-grams may occur in English as well.

Again, if you know the possible languages a priori, it's possible to pass in an array of language codes like: {"de", "fr", "it", "gsw", "rm"}. In this case libpostal will not use the language classifier at all and apply expansions only in the languages specified.