openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.03k stars 417 forks source link

Error loading transliteration module, dir=(null) at libpostal_setup_datadir (libpostal.c:266) errno:No such file or directory #365

Open Goof2018 opened 6 years ago

Goof2018 commented 6 years ago

Hello together,

I tried to build libpostal on Windows with msys2.

Installation (Windows)

MSys2/MinGW

For Windows the build procedure currently requires MSys2 and MinGW. This can be downloaded from http://msys2.org. Please follow the instructions on the MSys2 website for installation.

Please ensure Msys2 is up-to-date by running:

pacman -Syu

Install the following prerequisites:

_pacman -S autoconf automake curl git make libtool gcc mingw-w64-x8664-gcc

Then to build the C library:

_git clone https://github.com/openvenues/libpostal cd libpostal *cp -rf windows/ ./ ./bootstrap.sh_**

**_./configure --datadir=$DATA_DIR/home/User/libpostal/data --disable-data-download

./src/libpostal_data download all $DATA_DIR/home/User/libpostal/data make -j4 make install**

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.14.26428\bin\Hostx64\x64\lib.exe" /def:libpostal.def /out:libpostal.lib /machine:x64_

But I get the error: C:\msys64\home\User\libpostal\src\libpostal.exe "Quatre vingt douze Ave des Champs-Élysées" ERR Error loading transliteration module, dir=(null) at libpostal_setup_datadir (libpostal.c:266) errno:No such file or directory

Files: C:\msys64\home\User\libpostal\data

data_version
last_updated last_updated_language_classifier
last_updated_parser

C:\msys64\home\User\libpostal\data\address_expansions

address_dictionary.dat
1 Datei(en), 8.737.530 Bytes

C:\msys64\home\User\libpostal\data\address_parser

address_parser_crf.dat
address_parser_phrases.dat address_parser_postal_codes.dat
address_parser_vocab.trie

4 Datei(en), 1.872.722.323 Bytes

C:\msys64\home\User\libpostal\data\geonames

.gitignore
1 Datei(en), 71 Bytes

C:\msys64\home\User\libpostal\data\language_classifier

language_classifier.dat
1 Datei(en), 77.823.270 Bytes

C:\msys64\home\User\libpostal\data\numex

numex.dat
1 Datei(en), 396.966 Bytes

C:\msys64\home\User\libpostal\data\transliteration

transliteration.dat
1 Datei(en), 19.570.085 Bytes

How could I resolve this?

Thank you

AeroXuk commented 6 years ago

./configure automatically adds /libpostal on the end, so $DATA_DIR/home/User/libpostal/data becomes $DATA_DIR/home/User/libpostal/data/libpostal. This is where the library now looks for libpostals data files by default.

Goof2018 commented 6 years ago

Thank you very much. That solves one problem.

When I use the C:\msys64\home\User\libpostal\src\address_parser.exe

Quatre vingt douze Ave des Champs-Élysées

I get the following error. How could I fix this?

WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory WARN invalid UTF-8 at transliterate (transliterate.c:791) errno:No such file or directory

AeroXuk commented 6 years ago

This is a problem with the way the windows command line works rather then LibPostal:

>address_parser.exe
Loading models...

Welcome to libpostal's address parser.

Type in any address to parse and print the result.

Special commands:
.exit to quit the program

> 10 Downing Street, Westminster, London, SW1A 2AA

Result:

{
  "house_number": "10",
  "road": "downing street",
  "city_district": "westminster",
  "city": "london",
  "postcode": "sw1a 2aa"
}

> Quatre vingt douze Ave des Champs-Élysées
WARN  invalid UTF-8
 at transliterate (transliterate.c:791) errno:No such file or directory
WARN  invalid UTF-8
 at transliterate (transliterate.c:791) errno:No such file or directory
WARN  invalid UTF-8
 at transliterate (transliterate.c:791) errno:No such file or directory
WARN  invalid UTF-8
 at transliterate (transliterate.c:791) errno:No such file or directory

I have a C# .NET library for using libPostal on windows which correctly handles UTF-8 characters (https://github.com/AeroXuk/LibPostalNet). The following C# code:

string exampleAddress = "Quatre vingt douze Ave des Champs-Élysées";

LibPostal libPostal = LibPostal.GetInstance();

Console.WriteLine("Test Parse:");
var addressParserOptions = libPostal.GetAddressParserDefaultOptions();
using (var responce = libPostal.ParseAddress(exampleAddress, addressParserOptions))
{
    foreach (var x in responce.Results)
    {
        Console.WriteLine("{0}: {1}", x.Key, x.Value);
    }
}
Console.WriteLine();

Console.WriteLine("Test Expand:");
var normaliseOptions = libPostal.GetAddressExpansionDefaultOptions();
using (var expand = libPostal.ExpandAddress(exampleAddress, normaliseOptions))
{
    foreach (var x in expand.Expansions)
    {
        Console.WriteLine(x);
    }
}

Produces:

Test Parse:
road: quatre vingt douze ave des champs-élysées

Test Expand:
92 avenue des champs-elysees
92 avenue des champs elysees
AeroXuk commented 6 years ago

Using the .Net Console.ReadLine() seems to handle UTF-8 characters better. Here is a simplified version of address_parser.exe in C#:

using LibPostalNet;
using System;

namespace AddressParser
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            string address_parser_dir = null;

            if (args.Length > 0)
            {
                address_parser_dir = args[0];
            }

            Console.WriteLine("Loading models...");

            LibPostal libPostal = LibPostal.GetInstance(address_parser_dir);
            libPostal.LoadParser();

            if (!libPostal.IsParserLoaded)
            {
                Console.Write("Failure while loading.");
                Environment.Exit(2);
            }

            Console.WriteLine();
            Console.WriteLine("Welcome to libpostal's address parser.");
            Console.WriteLine();
            Console.WriteLine("Type in any address to parse and print the result.");
            Console.WriteLine();
            Console.WriteLine("Special commands:");
            Console.WriteLine(".exit to quit the program");
            Console.WriteLine();

            string input = string.Empty;
            while (true)
            {
                Console.Write("> ");
                input = Console.ReadLine();

                // TODO: Add .language & .country support
                if (string.Equals(input, ".exit", StringComparison.InvariantCultureIgnoreCase))
                {
                    Console.WriteLine("Fin!");
                    break;
                }
                else if (string.Equals(input, ".print_features", StringComparison.InvariantCultureIgnoreCase))
                {
                    libPostal.PrintFeatures = true;
                }
                else if (input.Length < 1)
                {
                    continue;
                }

                var options = libPostal.GetAddressParserDefaultOptions();
                using (var parsed = libPostal.ParseAddress(input, options))
                {
                    Console.WriteLine();
                    Console.WriteLine("Result:");
                    Console.WriteLine();

                    Console.WriteLine(parsed.ToJSON());
                    Console.WriteLine();
                }
            }
        }
    }
}
AeroXuk commented 6 years ago

Actually, think I've found an easier answer for you (https://stackoverflow.com/a/388500/2594742).

Run this command before starting address_parser.exe to switch the command prompt code page to UTF-8:

chcp 65001
Goof2018 commented 6 years ago

Thank you very much. I'll give it a try.

jbelien commented 5 years ago

Hello, I ran into the same issue:

debian@development:~/libpostal/src$ ./address_parser
Loading models...
ERR   Error loading transliteration module, dir=(null)
   at libpostal_setup_datadir (libpostal.c:266) errno: No such file or directory

See https://github.com/openvenues/php-postal/issues/8#issuecomment-445171062

How can I fix this ? It seems that @Goof2018 succeeded to fix this one but doesn't explain how (and I'm on Debian 9).

Thanks a lot !

jbelien commented 5 years ago

UPDATE: I deleted everything and made a fresh install and it seems to be fixed (I run into errno: Cannot allocate memory now but I guess it's not related) !

Good thing to know (https://github.com/openvenues/libpostal/issues/365#issuecomment-402501481):

./configure automatically adds /libpostal on the end, so $DATA_DIR/home/User/libpostal/data becomes $DATA_DIR/home/User/libpostal/data/libpostal. This is where the library now looks for libpostals data files by default.