osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.15k stars 715 forks source link

legacy tokenizer is not found during import, neither is legacy_icu / legacy-icu #2327

Closed yoshi314 closed 3 years ago

yoshi314 commented 3 years ago

During initial import of the osm data, i get an error

2021-05-13 13:05:54  Reading input files done in 14200s (3h 56m 40s).         50.1/s)
2021-05-13 13:05:54    Processed 2774954638 nodes in 1211s (20m 11s) - 2291k/s
2021-05-13 13:05:54    Processed 333503908 ways in 11165s (3h 6m 5s) - 30k/s
2021-05-13 13:05:54    Processed 5643545 relations in 1824s (30m 24s) - 3k/s
2021-05-13 13:05:56  osm2pgsql took 14202s (3h 56m 42s) overall.
2021-05-13 13:05:57: Create functions (1st pass)
2021-05-13 13:05:57: Create tables
2021-05-13 13:06:20: Create functions (2nd pass)
2021-05-13 13:06:21: Create table triggers
2021-05-13 13:06:21: Create partition tables
2021-05-13 13:06:23: Create functions (3rd pass)
2021-05-13 13:06:23: Importing wikipedia importance data
2021-05-13 13:08:42: Initialise tables
2021-05-13 13:08:44: Load data into placex table
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

2021-05-13 14:27:48: Setting up tokenizer
2021-05-13 14:27:49: No tokenizer named 'legacy_icu' available. Check the setting of NOMINATIM_TOKENIZER.
2021-05-13 14:27:49: FATAL: Tokenizer not found

Same thing for legacy tokenizer.

To Reproduce Build current git on debian10. Try to import Europe dump.

Software Environment (please complete the following information):

Hardware Configuration (please complete the following information):

My .env :

NOMINATIM_DATABASE_DSN="pgsql:dbname=mapa2"
NOMINATIM_DATABASE_WEBUSER="www-data"
NOMINATIM_DATABASE_MODULE_PATH=
NOMINATIM_TOKENIZER="legacy-icu"
NOMINATIM_MAX_WORD_FREQUENCY=50000
NOMINATIM_LIMIT_REINDEXING=yes
NOMINATIM_LANGUAGES=
NOMINATIM_TERM_NORMALIZATION=":: NFD (); [[:Nonspacing Mark:] [:Cf:]] >;  :: lower (); [[:Punctuation:][:Space:]]+ > ' '; :: NFC ();"
NOMINATIM_TOKENIZER_CONFIG=
NOMINATIM_USE_US_TIGER_DATA=no
NOMINATIM_USE_AUX_LOCATION_DATA=no
NOMINATIM_HTTP_PROXY=no
NOMINATIM_HTTP_PROXY_HOST=proxy.mydomain.com
NOMINATIM_HTTP_PROXY_PORT=3128
NOMINATIM_HTTP_PROXY_LOGIN=
NOMINATIM_HTTP_PROXY_PASSWORD=
NOMINATIM_OSM2PGSQL_BINARY=
NOMINATIM_TIGER_DATA_PATH=
NOMINATIM_WIKIPEDIA_DATA_PATH=
NOMINATIM_PHRASE_CONFIG=
NOMINATIM_ADDRESS_LEVEL_CONFIG=
NOMINATIM_IMPORT_STYLE=extratags
NOMINATIM_FLATNODE_FILE="/var/lib/postgresql/nodes.dat"
NOMINATIM_TABLESPACE_SEARCH_DATA=
NOMINATIM_TABLESPACE_SEARCH_INDEX=
NOMINATIM_TABLESPACE_OSM_DATA=
NOMINATIM_TABLESPACE_OSM_INDEX=
NOMINATIM_TABLESPACE_PLACE_DATA=
NOMINATIM_TABLESPACE_PLACE_INDEX=
NOMINATIM_TABLESPACE_ADDRESS_DATA=
NOMINATIM_TABLESPACE_ADDRESS_INDEX=
NOMINATIM_TABLESPACE_AUX_DATA=
NOMINATIM_TABLESPACE_AUX_INDEX=
NOMINATIM_REPLICATION_URL="https://planet.openstreetmap.org/replication/minute"
NOMINATIM_REPLICATION_MAX_DIFF=50
NOMINATIM_REPLICATION_UPDATE_INTERVAL=75
NOMINATIM_REPLICATION_RECHECK_INTERVAL=60
NOMINATIM_CORS_NOACCESSCONTROL=yes
NOMINATIM_MAPICON_URL=
NOMINATIM_DEFAULT_LANGUAGE=
NOMINATIM_SEARCH_BATCH_MODE=no
NOMINATIM_SEARCH_NAME_ONLY_THRESHOLD=500
NOMINATIM_LOOKUP_MAX_COUNT=50
NOMINATIM_POLYGON_OUTPUT_MAX_TYPES=1
NOMINATIM_LOG_DB=no
NOMINATIM_LOG_FILE=

i also tried legacy value for tokenizer, with the exact same error message.

lonvia commented 3 years ago

The default settings for the new legacy_icu tokenizer were not being installed, which I have fixed in 2992dea5c8d206cd332c5a5e782f33c497953f4c. but other than that I can't reproduce this behaviour.

It sounds like you have a half-installed version of Nominatim. Try reinstalling a clean version of Nominatim and make sure your project directory is well separated from the Nominatim source, build and installation directory.

yoshi314 commented 3 years ago

hmm still hitting this error. Really stumped here as to what could be wrong. Will retry with legacy tokenizer this time.

lonvia commented 3 years ago

Can you describe the exact steps how you installed Nomiantim and how you did the import? Also, there is a CI script successfully doing the import with the legacy tokenizer. Maybe you can check that out and find a step where your method differs.

yoshi314 commented 3 years ago

i did a git checkout

mkdir build cd build cmake .. (resolve all misssing deps) make make install

made separate project directory ~/_nominatim with aforementioned env file and tried to import europe merged with canary islands.

This system previously served nominatim 3.4.1 installed in ~/Nominatim/ + ~/Nominatim/build but it should not conflict with this one.

If that might be the case, I'll try to setup a clean system.

lonvia commented 3 years ago

Sounds exactly right. This is odd. The 3.4.1 installation should not interfere.

I suspect something still goes wrong with the installation of the nominatim binary. Can you do an ls /usr/local/lib/nominatim/lib-python/nominatim/tokenizer and let me know the contents of /usr/local/bin/nominatim. Also make sure that which nominiatim really points to the one in /usr/local.

yoshi314 commented 3 years ago

tokenizers: (should init be empty?)

-rw-r--r-- 1 root root  3068 May 13 09:04 factory.py
-rw-r--r-- 1 root root     0 May 13 09:04 __init__.py
-rw-r--r-- 1 root root 23767 May 14 10:54 legacy_icu_tokenizer.py
-rw-r--r-- 1 root root 21865 May 14 10:54 legacy_tokenizer.py

nominatim script

#!/usr/bin/env python3
import sys
import os

sys.path.insert(1, '/usr/local/lib/nominatim/lib-python')

os.environ['NOMINATIM_NOMINATIM_TOOL'] = os.path.abspath(__file__)

from nominatim import cli

exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
                   osm2pgsql_path='/usr/local/lib/nominatim/osm2pgsql',
                   phplib_dir='/usr/local/lib/nominatim/lib-php',
                   sqllib_dir='/usr/local/lib/nominatim/lib-sql',
                   data_dir='/usr/local/share/nominatim',
                   config_dir='/usr/local/etc/nominatim',
                   phpcgi_path='/usr/bin/php-cgi'))

I distinctly recall that when i tried checkout last tag, i had bizarre build errors. Maybe that could help here.

lonvia commented 3 years ago

That' s all as it should be, including the empty __init__.py file (that's just the usual marker for Python to know that this directory contains Python source files).

Hmm, maybe there is another error hidden here. Can you please add a print to /usr/local/lib/nominatim/lib-python/nominatim/tokenizer/factory.py in the except block around line 32. It should then look like this:

    try:
        import sys
        print(sys.path)
        return importlib.import_module('nominatim.tokenizer.' + name + '_tokenizer')
    except ModuleNotFoundError as exp:
        print("Exception", exp) # <------ add this line here
        LOG.fatal("No tokenizer named '%s' available. "
                  "Check the setting of NOMINATIM_TOKENIZER.", name)
        raise UsageError('Tokenizer not found') from exp

(If you are not familiar with Python: careful with the spaces in front of the line. Make sure there are exactly as many as on the line below and no Tabs.)

yoshi314 commented 3 years ago

With legacy_icu go i got

Module 'icu' not found. So i suppose missing dependencies on my end. I wonder what's breaking the legacy one.

lonvia commented 3 years ago

Aha. It's really bad that the error message hides the underlying issue. I shall fix that.

You need the python ICU library: apt install python3-icu. The development docs should always have the full dependency list for master: https://nominatim.org/release-docs/develop/appendix/Install-on-Ubuntu-20/

yoshi314 commented 3 years ago

It seems to be going now, but yes - a dependency check would be nice to have before the entire loading process begins.