pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
27 stars 42 forks source link

Whosonfirst reports index does not exist for improper timeout configuration in `pelias.json` #535

Open creativesapiens opened 1 year ago

creativesapiens commented 1 year ago

Describe the bug

Pelias whosonfirst importer reports the following error when importing with npm run start with an improper timeout setting in pelias.json. The reported error is:

ERROR: Elasticsearch index pelias does not exist
You must use the pelias-schema tool (https://github.com/pelias/schema/) to create the index first
For full instructions on setting up Pelias, see http://pelias.io/install.html
/home/user/pelias/whosonfirst/node_modules/pelias-dbclient/src/configValidation.js:39
        throw new Error(`elasticsearch index ${config.schema.indexName} does not exist`);

Error: elasticsearch index pelias does not exist
    at existsCallback (/home/user/Softwares/whosonfirst/node_modules/pelias-dbclient/src/configValidation.js:39:15)
    at respond (/home/user/Softwares/whosonfirst/node_modules/elasticsearch/src/lib/transport.js:368:9)
    at /home/user/Softwares/whosonfirst/node_modules/elasticsearch/src/lib/transport.js:396:7
    at Timeout.<anonymous> (/home/user/Softwares/whosonfirst/node_modules/elasticsearch/src/lib/transport.js:429:7)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)

Whereas one can clearly see that the index does exist:

$ curl http://localhost:9200/_cat/indices/*?v=true
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases Cfim9lIdRZO1D6X2UqcHqQ   1   0         41            0       39mb           39mb
green  open   pelias           iQYgJrn9QWySZgZ-tC80NA   1   0          0            0       226b           226b

The configuration was done as:

"esclient": {
    "apiVersion": "7.x",
    "keepAlive": true,
    "requestTimeout": 12000,
    "hosts": [{
      "env": "development",
      "protocol": "http",
      "host": "localhost",
      "port": 9200
    }],
}

Steps to Reproduce

  1. Install elasticsearch, with required dependencies
  2. Load elasticsearch schema
  3. Have the above mentioned configuration in pelias.json
  4. Download wof data in data directory
  5. Start wof import with npm run start

Expected behavior

A message should be presented that this was an issue with timeout, or perhaps an issue with JSON file.

Environment (please complete the following information):

Pastebin/Screenshots

Additional context

Complete command run with stack trace was given as:


> pelias-whosonfirst@0.0.0-development start
> ./bin/start

2022-10-23T20:44:18.540Z - debug: [whosonfirst] Loading 'ocean' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:21.299Z - debug: [whosonfirst] Loading 'marinearea' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:24.400Z - debug: [whosonfirst] Loading 'continent' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:27.180Z - debug: [whosonfirst] Loading 'empire' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:29.851Z - debug: [whosonfirst] Loading 'country' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:34.716Z - debug: [whosonfirst] Loading 'dependency' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:37.350Z - debug: [whosonfirst] Loading 'disputed' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:40.029Z - debug: [whosonfirst] Loading 'macroregion' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
2022-10-23T20:44:43.366Z - debug: [whosonfirst] Loading 'region' of whosonfirst-data-admin-latest.db database from /home/user/Downloads/whosonfirst/sqlite
ERROR: Elasticsearch index pelias does not exist
You must use the pelias-schema tool (https://github.com/pelias/schema/) to create the index first
For full instructions on setting up Pelias, see http://pelias.io/install.html
/home/user/Softwares/whosonfirst/node_modules/pelias-dbclient/src/configValidation.js:39
        throw new Error(`elasticsearch index ${config.schema.indexName} does not exist`);
        ^

Error: elasticsearch index pelias does not exist
    at existsCallback (/home/user/Softwares/whosonfirst/node_modules/pelias-dbclient/src/configValidation.js:39:15)
    at respond (/home/user/Softwares/whosonfirst/node_modules/elasticsearch/src/lib/transport.js:368:9)
    at /home/user/Softwares/whosonfirst/node_modules/elasticsearch/src/lib/transport.js:396:7
    at Timeout.<anonymous> (/home/user/Softwares/whosonfirst/node_modules/elasticsearch/src/lib/transport.js:429:7)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)

References

What fixed it?

Having a proper pelias.json configuration with timeout fixed it:

Note: requestTimeout as changed to a string with a value of "120000".

"esclient": {
    "apiVersion": "7.x",
    "keepAlive": true,
    "requestTimeout": "120000",
    "hosts": [{
      "env": "development",
      "protocol": "http",
      "host": "localhost",
      "port": 9200
    }],
}
orangejulius commented 1 year ago

Hi @creativesapiens, thanks for the comprehensive bug report. We've been tracking this issue for a while, with reports in https://github.com/pelias/docker/issues/217 among other places. It appears to us that there's something a bit different about the Who's on First importer where it hits this issue, when other importers don't. However none of the Pelias team has ever been able to reproduce it, so maybe you can help us track it down.

We've also seen the issue where invalid requestTimeout values are interpreted as 0ms, leading to timeout errors though that was a long time ago and with clearly invalid values like 120_000. However, I tested both "120000" and 120000 as timeout values and they both worked fine for me.

Can you answer a couple questions for me?

Thanks!

gmarti commented 1 year ago

I think the issue is that here https://github.com/pelias/dbclient/blob/master/src/configValidation.js#L34 If there is an error it logs that it doesn't exist And the error is silenced and nothing is printed.

Kilowhisky commented 1 month ago

I'm also encountering this problem and i'm using the default config timeout.

    "esclient": {
        "apiVersion": "7.x",
        "keepAlive": true,
        "requestTimeout": "120000",

Here's my full config:

{
    "esclient": {
        "apiVersion": "7.x",
        "keepAlive": true,
        "requestTimeout": "120000",
        "hosts": [
            {
                "env": "development",
                "protocol": "https",
                "host": "AWS.us-west-2.es.amazonaws.com",
                "port": 443,
                "auth": "negatron"
            }
        ],
        "log": [
            {
                "type": "stdio",
                "json": false,
                "level": [
                    "error",
                    "warning"
                ]
            }
        ]
    },
    "elasticsearch": {
        "settings": {
            "index": {
                "number_of_replicas": "0",
                "number_of_shards": "5",
                "refresh_interval": "1m"
            }
        }
    },
    "interpolation": {
        "client": {
            "adapter": "null"
        }
    },
    "dbclient": {
        "statFrequency": 10000,
        "batchSize": 500
    },
    "api": {
        "accessLog": "common",
        "host": "http://pelias",
        "indexName": "pelias",
        "version": "1.0",
        "targets": {
            "auto_discover": true,
            "canonical_sources": [
                "whosonfirst",
                "openstreetmap",
                "openaddresses",
                "geonames"
            ],
            "layers_by_source": {
                "openstreetmap": [
                    "address",
                    "venue",
                    "street"
                ],
                "openaddresses": [
                    "address"
                ],
                "geonames": [
                    "country",
                    "macroregion",
                    "region",
                    "county",
                    "localadmin",
                    "locality",
                    "borough",
                    "neighbourhood",
                    "venue"
                ],
                "whosonfirst": [
                    "continent",
                    "empire",
                    "country",
                    "dependency",
                    "macroregion",
                    "region",
                    "locality",
                    "localadmin",
                    "macrocounty",
                    "county",
                    "macrohood",
                    "borough",
                    "neighbourhood",
                    "microhood",
                    "disputed",
                    "venue",
                    "postalcode",
                    "ocean",
                    "marinearea"
                ]
            },
            "source_aliases": {
                "osm": [
                    "openstreetmap"
                ],
                "oa": [
                    "openaddresses"
                ],
                "gn": [
                    "geonames"
                ],
                "wof": [
                    "whosonfirst"
                ]
            },
            "layer_aliases": {
                "coarse": [
                    "continent",
                    "empire",
                    "country",
                    "dependency",
                    "macroregion",
                    "region",
                    "locality",
                    "localadmin",
                    "macrocounty",
                    "county",
                    "macrohood",
                    "borough",
                    "neighbourhood",
                    "microhood",
                    "disputed",
                    "postalcode",
                    "ocean",
                    "marinearea"
                ]
            }
        },
        "port": 3100,
        "attributionURL": "nope",
        "services": {
            "pip": {
                "url": "http://localhost:3102"
            },
            "libpostal": {
                "url": "http://localhost:4400"
            },
            "placeholder": {
                "url": "http://localhost:3000"
            }
        }
    },
    "schema": {
        "indexName": "pelias"
    },
    "logger": {
        "level": "debug",
        "timestamp": true,
        "colorize": true
    },
    "acceptance-tests": {
        "endpoints": {
            "local": "http://localhost:3100/v1/"
        }
    },
    "imports": {
        "adminLookup": {
            "enabled": true,
            "maxConcurrentRequests": 100,
            "usePostalCities": true
        },
        "blacklist": {
            "files": []
        },
        "csv": {},
        "geonames": {
            "datapath": "/data/pelias/geonames",
            "countryCode": "US"
        },
        "openstreetmap": {
            "datapath": "/data/pelias/openstreetmap",
            "leveldbpath": "/tmp",
            "import": [
                {
                    "filename": "extract.osm.pbf"
                }
            ]
        },
        "openaddresses": {
            "datapath": "/mnt/pelias/openaddresses",
            "token": "oa.bbbcf5787bb4251445883cc417f811ba02b9fd64809fd56c5a972171fbcfb2f6",
            "files": []
        },
        "polyline": {
            "datapath": "/data/pelias/polyline",
            "files": [
                "north-america-valhalla.polylines.0sv"
            ]
        },
        "whosonfirst": {
            "datapath": "/data/pelias/whosonfirst",
            "importPostalcodes": true,
            "countryCode": "US"
        }
    }
}
michaelkirk commented 1 month ago

If the root causes is a timeout (hard to know with the current logging, until https://github.com/pelias/dbclient/pull/129 is rolled out to the various client libraries), you can increase the timeout.

From https://github.com/pelias/docker/issues/217#issuecomment-1310547892

After having the planet sized import fail a couple dozen times with the default 2 minute timeout, I specified a timeout of 10 minutes and was able to complete the import on the first try.

pelias config:

{
  "esclient": {
    "requestTimeout": "600000",
    ...
  },
  ...
}