Open Goof2018 opened 6 years ago
./configure
automatically adds /libpostal
on the end, so $DATA_DIR/home/User/libpostal/data
becomes $DATA_DIR/home/User/libpostal/data/libpostal
.
This is where the library now looks for libpostals data files by default.
Thank you very much. That solves one problem.
When I use the C:\msys64\home\User\libpostal\src\address_parser.exe
Quatre vingt douze Ave des Champs-Élysées
I get the following error. How could I fix this?
WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory
This is a problem with the way the windows command line works rather then LibPostal:
>address_parser.exe
Loading models...
Welcome to libpostal's address parser.
Type in any address to parse and print the result.
Special commands:
.exit to quit the program
> 10 Downing Street, Westminster, London, SW1A 2AA
Result:
{
"house_number": "10",
"road": "downing street",
"city_district": "westminster",
"city": "london",
"postcode": "sw1a 2aa"
}
> Quatre vingt douze Ave des Champs-Élysées
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno:No such file or directory
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno:No such file or directory
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno:No such file or directory
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno:No such file or directory
I have a C# .NET library for using libPostal on windows which correctly handles UTF-8 characters (https://github.com/AeroXuk/LibPostalNet). The following C# code:
string exampleAddress = "Quatre vingt douze Ave des Champs-Élysées";
LibPostal libPostal = LibPostal.GetInstance();
Console.WriteLine("Test Parse:");
var addressParserOptions = libPostal.GetAddressParserDefaultOptions();
using (var responce = libPostal.ParseAddress(exampleAddress, addressParserOptions))
{
foreach (var x in responce.Results)
{
Console.WriteLine("{0}: {1}", x.Key, x.Value);
}
}
Console.WriteLine();
Console.WriteLine("Test Expand:");
var normaliseOptions = libPostal.GetAddressExpansionDefaultOptions();
using (var expand = libPostal.ExpandAddress(exampleAddress, normaliseOptions))
{
foreach (var x in expand.Expansions)
{
Console.WriteLine(x);
}
}
Produces:
Test Parse:
road: quatre vingt douze ave des champs-élysées
Test Expand:
92 avenue des champs-elysees
92 avenue des champs elysees
Using the .Net Console.ReadLine()
seems to handle UTF-8 characters better.
Here is a simplified version of address_parser.exe in C#:
using LibPostalNet;
using System;
namespace AddressParser
{
internal class Program
{
private static void Main(string[] args)
{
string address_parser_dir = null;
if (args.Length > 0)
{
address_parser_dir = args[0];
}
Console.WriteLine("Loading models...");
LibPostal libPostal = LibPostal.GetInstance(address_parser_dir);
libPostal.LoadParser();
if (!libPostal.IsParserLoaded)
{
Console.Write("Failure while loading.");
Environment.Exit(2);
}
Console.WriteLine();
Console.WriteLine("Welcome to libpostal's address parser.");
Console.WriteLine();
Console.WriteLine("Type in any address to parse and print the result.");
Console.WriteLine();
Console.WriteLine("Special commands:");
Console.WriteLine(".exit to quit the program");
Console.WriteLine();
string input = string.Empty;
while (true)
{
Console.Write("> ");
input = Console.ReadLine();
// TODO: Add .language & .country support
if (string.Equals(input, ".exit", StringComparison.InvariantCultureIgnoreCase))
{
Console.WriteLine("Fin!");
break;
}
else if (string.Equals(input, ".print_features", StringComparison.InvariantCultureIgnoreCase))
{
libPostal.PrintFeatures = true;
}
else if (input.Length < 1)
{
continue;
}
var options = libPostal.GetAddressParserDefaultOptions();
using (var parsed = libPostal.ParseAddress(input, options))
{
Console.WriteLine();
Console.WriteLine("Result:");
Console.WriteLine();
Console.WriteLine(parsed.ToJSON());
Console.WriteLine();
}
}
}
}
}
Actually, think I've found an easier answer for you (https://stackoverflow.com/a/388500/2594742).
Run this command before starting address_parser.exe to switch the command prompt code page to UTF-8:
chcp 65001
Thank you very much. I'll give it a try.
Hello, I ran into the same issue:
debian@development:~/libpostal/src$ ./address_parser
Loading models...
ERR Error loading transliteration module, dir=(null)
at libpostal_setup_datadir (libpostal.c:266) errno: No such file or directory
See https://github.com/openvenues/php-postal/issues/8#issuecomment-445171062
How can I fix this ? It seems that @Goof2018 succeeded to fix this one but doesn't explain how (and I'm on Debian 9).
Thanks a lot !
UPDATE: I deleted everything and made a fresh install and it seems to be fixed (I run into errno: Cannot allocate memory
now but I guess it's not related) !
Good thing to know (https://github.com/openvenues/libpostal/issues/365#issuecomment-402501481):
./configure
automatically adds/libpostal
on the end, so$DATA_DIR/home/User/libpostal/data
becomes$DATA_DIR/home/User/libpostal/data/libpostal
. This is where the library now looks for libpostals data files by default.
Hello together,
I tried to build libpostal on Windows with msys2.
Installation (Windows)
MSys2/MinGW
For Windows the build procedure currently requires MSys2 and MinGW. This can be downloaded from http://msys2.org. Please follow the instructions on the MSys2 website for installation.
Please ensure Msys2 is up-to-date by running:
pacman -Syu
Install the following prerequisites:
_pacman -S autoconf automake curl git make libtool gcc mingw-w64-x8664-gcc
Then to build the C library:
_git clone https://github.com/openvenues/libpostal cd libpostal *cp -rf windows/ ./ ./bootstrap.sh_**
**_./configure --datadir=$DATA_DIR/home/User/libpostal/data --disable-data-download
./src/libpostal_data download all $DATA_DIR/home/User/libpostal/data make -j4 make install**
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.14.26428\bin\Hostx64\x64\lib.exe" /def:libpostal.def /out:libpostal.lib /machine:x64_
But I get the error: C:\msys64\home\User\libpostal\src\libpostal.exe "Quatre vingt douze Ave des Champs-Élysées" ERR Error loading transliteration module, dir=(null) at libpostal_setup_datadir (libpostal.c:266) errno:No such file or directory
Files: C:\msys64\home\User\libpostal\data
data_version
last_updated last_updated_language_classifier
last_updated_parser
C:\msys64\home\User\libpostal\data\address_expansions
address_dictionary.dat
1 Datei(en), 8.737.530 Bytes
C:\msys64\home\User\libpostal\data\address_parser
address_parser_crf.dat
address_parser_phrases.dat address_parser_postal_codes.dat
address_parser_vocab.trie
4 Datei(en), 1.872.722.323 Bytes
C:\msys64\home\User\libpostal\data\geonames
.gitignore
1 Datei(en), 71 Bytes
C:\msys64\home\User\libpostal\data\language_classifier
language_classifier.dat
1 Datei(en), 77.823.270 Bytes
C:\msys64\home\User\libpostal\data\numex
numex.dat
1 Datei(en), 396.966 Bytes
C:\msys64\home\User\libpostal\data\transliteration
transliteration.dat
1 Datei(en), 19.570.085 Bytes
How could I resolve this?
Thank you