Closed BenK10 closed 7 years ago
Is this the very latest checkout? I added a commit recently to help with upgrades.
Sounds like the model files didn't download. Ensure that the datadir you specified during configure
has at least 1.8GB of disk space free. When you ran make
it should have downloaded the new model files. If that didn't happen, try removing the previous datadir and running make again.
To add a little more detail, I use a Dockerfile to build the latest libpostal inside a Docker container. It clones the latest libpostal release from GitHub and then runs a script invoking make:
FROM ubuntu:16.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y \
curl autoconf automake libtool pkg-config \
git
WORKDIR /
RUN git clone https://github.com/openvenues/libpostal
WORKDIR /libpostal
COPY ./build_libpostal.sh .
RUN ./build_libpostal.sh
build_libpostal.sh:
#!/usr/bin/env bash
./bootstrap.sh
mkdir -p /opt/libpostal_data
./configure --datadir=/opt/libpostal_data
make
make install
ldconfig
As of April 10, 2017 I'm still having the same issue even with the latest release.
Ok, so I just created a Docker container on my Mac, ran those same commands, and could not replicate the issue. Does the container have enough memory and is there sufficient disk space on the machine? The new version should actually require slightly less memory/disk than the previous one, so if it's the same container, that shouldn't be an issue.
If everything's working correctly, toward the end of make
, you should see something like:
./libpostal_data download all /opt/libpostal_data/libpostal
Old version of datadir detected, removing...
Checking for new libpostal data file...
New libpostal data file available
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 9906k 100 9906k 0 0 5005k 0 0:00:01 0:00:01 --:--:-- 5003k
address_expansions/
address_expansions/address_dictionary.dat
numex/
numex/numex.dat
transliteration/
transliteration/transliteration.dat
Checking for new libpostal parser data file...
New libpostal parser data file available
Downloading multipart: http://libpostal.s3.amazonaws.com/models/address_parser/2017-03-04/parser.tar.gz, size=752483239, num_chunks=11
Downloading part 2: filename=/opt/libpostal_data/libpostal/parser.tar.gz.2, offset=67108864, max=134217727
Downloading part 1: filename=/opt/libpostal_data/libpostal/parser.tar.gz.1, offset=0, max=67108863
Downloading part 3: filename=/opt/libpostal_data/libpostal/parser.tar.gz.3, offset=134217728, max=201326591
Downloading part 4: filename=/opt/libpostal_data/libpostal/parser.tar.gz.4, offset=201326592, max=268435455
Downloading part 5: filename=/opt/libpostal_data/libpostal/parser.tar.gz.5, offset=268435456, max=335544319
Downloading part 6: filename=/opt/libpostal_data/libpostal/parser.tar.gz.6, offset=335544320, max=402653183
Downloading part 7: filename=/opt/libpostal_data/libpostal/parser.tar.gz.7, offset=402653184, max=469762047
Downloading part 8: filename=/opt/libpostal_data/libpostal/parser.tar.gz.8, offset=469762048, max=536870911
Downloading part 9: filename=/opt/libpostal_data/libpostal/parser.tar.gz.9, offset=536870912, max=603979775
Downloading part 10: filename=/opt/libpostal_data/libpostal/parser.tar.gz.10, offset=603979776, max=671088639
Downloading part 11: filename=/opt/libpostal_data/libpostal/parser.tar.gz.11, offset=671088640, max=752483239
address_parser/
address_parser/address_parser_crf.dat
address_parser/address_parser_phrases.dat
address_parser/address_parser_postal_codes.dat
address_parser/address_parser_vocab.trie
Checking for new libpostal language classifier data file...
New libpostal language classifier data file available
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 48.0M 100 48.0M 0 0 7289k 0 0:00:06 0:00:06 --:--:-- 8041k
language_classifier/
language_classifier/language_classifier.dat
If not, the new models weren't downloaded and it's probably related to disk space.
The models indeed don't download. Make reports that they are up to date. This is odd because it's a new Docker container with a new image. The models shouldn't even exist.
Should have plenty of disk and memory space. It all worked before.
So if the datadir existed, it's plausible that the source dir already existed, in which case I think git clone
is basically a no-op. If that's true, you wouldn't have the latest commit which fixes upgrades for existing datadirs from v0. In your Dockerfile you may want to rm -rf /libpostal
to be sure that it's a fresh checkout or run a git checkout tags/v1.0.0
after cloning the repo.
Still any issues or can this be closed?
I'm still working on it.
Upon closer examination, I found that the data files actually do download but then there is another execution of libpostal_data download all
for them that downloads nothing. If the address parser doesn't work even though the models download, then maybe they are somehow getting clobbered? Here's some output from make with compilation instructions removed:
./libpostal_data download all /opt/libpostal_data/libpostal
Old version of datadir detected, removing...
Checking for new libpostal data file...
New libpostal data file available
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 9906k 100 9906k 0 0 2707k 0 0:00:03 0:00:03 --:--:-- 2706k
address_expansions/
address_expansions/address_dictionary.dat
numex/
numex/numex.dat
transliteration/
transliteration/transliteration.dat
Checking for new libpostal parser data file...
New libpostal parser data file available
Downloading multipart: http://libpostal.s3.amazonaws.com/models/address_parser/2017-03-04/parser.tar.gz, size=752483239, num_chunks=11
Downloading part 1: filename=/opt/libpostal_data/libpostal/parser.tar.gz.1, offset=0, max=67108863
Downloading part 2: filename=/opt/libpostal_data/libpostal/parser.tar.gz.2, offset=67108864, max=134217727
Downloading part 3: filename=/opt/libpostal_data/libpostal/parser.tar.gz.3, offset=134217728, max=201326591
Downloading part 4: filename=/opt/libpostal_data/libpostal/parser.tar.gz.4, offset=201326592, max=268435455
Downloading part 5: filename=/opt/libpostal_data/libpostal/parser.tar.gz.5, offset=268435456, max=335544319
Downloading part 6: filename=/opt/libpostal_data/libpostal/parser.tar.gz.6, offset=335544320, max=402653183
Downloading part 7: filename=/opt/libpostal_data/libpostal/parser.tar.gz.7, offset=402653184, max=469762047
Downloading part 8: filename=/opt/libpostal_data/libpostal/parser.tar.gz.8, offset=469762048, max=536870911
Downloading part 9: filename=/opt/libpostal_data/libpostal/parser.tar.gz.9, offset=536870912, max=603979775
Downloading part 10: filename=/opt/libpostal_data/libpostal/parser.tar.gz.10, offset=603979776, max=671088639
Downloading part 11: filename=/opt/libpostal_data/libpostal/parser.tar.gz.11, offset=671088640, max=752483239
address_parser/
address_parser/address_parser_crf.dat
address_parser/address_parser_phrases.dat
address_parser/address_parser_postal_codes.dat
address_parser/address_parser_vocab.trie
Checking for new libpostal language classifier data file...
New libpostal language classifier data file available
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 48.0M 100 48.0M 0 0 7757k 0 0:00:06 0:00:06 --:--:-- 10.4M
language_classifier/
language_classifier/language_classifier.dat
make[2]: Leaving directory '/libpostal/src'
Making all in test
make[2]: Entering directory '/libpostal/test'
....
make[2]: Leaving directory '/libpostal/test'
make[2]: Entering directory '/libpostal'
make[2]: Leaving directory '/libpostal'
make[1]: Leaving directory '/libpostal'
Making install in src
make[1]: Entering directory '/libpostal/src'
./libpostal_data download all /opt/libpostal_data/libpostal
Checking for new libpostal data file...
libpostal data file up to date
Checking for new libpostal parser data file...
libpostal parser data file up to date
Checking for new libpostal language classifier data file...
libpostal language classifier data file up to date
make[2]: Entering directory '/libpostal/src'
...
Wait, does that mean you tried deleting the source dir and/or checking out the v1.0.0 tag and still got that same error? If so I'm puzzled. Just recreated this entire sequence of events in what is AFAICT an identical Docker container, started with v0.3.4, upgraded to v1.0.0 without issue.
That output's normal, it's just make
and make install
respectively. Both commands have to run the libpostal_data download command (could just run make install
if you're already root - they're separated for people using sudo
to avoid permissions issues), but the first time, after the download succeeds, it saves some housekeeping files with the datadir version and the server timestamp of the last model downloaded so any subsequent invocations will not download the files unless the timestamp on the server has changed (i.e. there's an update).
If the checkout has the correct commit, the following file should be present after running make
: /opt/libpostal_data/libpostal/data_version
.
I've tried git checkout tags/v1.0.0
but I'm still getting the same problem. The data_version
file exists.
If that's the case, there should no longer be an address_parser.dat
, only address_parser_crf.dat
. Is that true?
yes, address_parser_crf.dat
is in opt/libpostal_data/libpostal/address_parser
and there is no address_parser.dat
Ok, then there appears to be nothing wrong with the libpostal setup. Since I'm unable to replicate the error using the same Docker environment, and no one else has reported it, I'd suggest just deleting the container and starting fresh.
Assuming this can be closed now?
I still haven't fixed the problem. Using a fresh container doesn't help. It's interesting that you can't replicate the problem. The one thing I haven't done yet is to run the container on another machine. But I am more concerned now with porting libpostal to Windows.
I'm closing this because I now have a Windows build.
Since I'm running a Windows machine, I've been using libpostal inside a Docker container with a Debian image for the last few weeks. This worked fine until I decided to rebuild the image with the latest libpostal release on Friday April 7th, 2017. Now when I run the address parser command line tool, I get the following error:
could not find parser model file of known type at address_parser_load (address_parser.c:208) errno: no such file or directory
it does not say which file is missing.