pelias / wof-admin-lookup

Who's on First Admin Lookup for the Pelias Geocoder
https://pelias.io
MIT License
9 stars 24 forks source link

OSM Admin Lookup fails for some nodes? #51

Closed DylanFrese closed 8 years ago

DylanFrese commented 8 years ago

I've set up a Pelias server and imported Geonames, WoF, OSM, and OA data for North America. I noticed, however, that after the import, a large amount of OSM data was missing. I later discovered that disabling admin lookup for OpenStreetMap did wind up with the nodes in the ES database, but obviously without the administrative data, which is unfortunate because having that data is important to my usecase.

Take an example node from the OpenStreetMap Minneapolis/Saint-Paul metro extract, node:2366371487 (Red Cow, Minneapolis, Minnesota, USA). Reverse geocoding via Pelias gives me this correct data:

{
  "geocoding": {
    "version": "0.1",
    "attribution": "/v1/attribution",
    "query": {
      "sources": [
        "whosonfirst"
      ],
      "size": 10,
      "private": false,
      "point.lat": 44.912733,
      "point.lon": -93.326069,
      "boundary.circle.radius": 1,
      "boundary.circle.lat": 44.912733,
      "boundary.circle.lon": -93.326069,
      "querySize": 20
    },
    "engine": {
      "name": "Pelias",
      "author": "Mapzen",
      "version": "1.0"
    },
    "timestamp": 1465508928532
  },
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          -93.319105,
          44.911398
        ]
      },
      "properties": {
        "id": "85872569",
        "gid": "whosonfirst:neighbourhood:85872569",
        "layer": "neighbourhood",
        "source": "whosonfirst",
        "source_id": "85872569",
        "name": "Fulton",
        "confidence": 0.6,
        "distance": 0.57,
        "country": "United States",
        "country_gid": "whosonfirst:country:85633793",
        "country_a": "USA",
        "region": "Minnesota",
        "region_gid": "whosonfirst:region:85688727",
        "region_a": "MN",
        "county": "Hennepin County",
        "county_gid": "whosonfirst:county:102087709",
        "localadmin": "Minneapolis",
        "localadmin_gid": "whosonfirst:localadmin:404511883",
        "locality": "Minneapolis",
        "locality_gid": "whosonfirst:locality:85969169",
        "label": "Fulton, Minneapolis, MN, USA"
      },
      "bbox": [
        -93.329094,
        44.905129,
        -93.308632,
        44.917988
      ]
    }
  ],
  "bbox": [
    -93.329094,
    44.905129,
    -93.308632,
    44.917988
  ]
}

but the administrative lookup for this node (and possibly a majority of other nodes) failed during the import, and the node was skipped.

This might also be the case for OA, I'm not certain. I felt this was a good point to pass the issue off to the Mapzen team who know way more about the components involved then I do. Let me know what information you need.

orangejulius commented 8 years ago

Hey @DylanFrese, Thanks for reporting this. How much RAM do you have on the system doing the importing? I just ran a quick import of the Minneapolis metro extract and it's only 30k records, so there's not a lot of data. But admin lookup does currently require a bunch of RAM, maybe 5-6 GB or so. If you're doing admin lookup and running Elasticsearch on the same box, and it doesn't have a lot of RAM, you might run into problems. Then again I've done admin lookup on a machine with only 4GB of RAM (and a lot of swap) and it works, just a little slower.

There's an option you can throw into your pelias.json file that we really should make the default, to enable extra logging:

  "logger": {
    "level": "debug",
    "timestamp": true
  },  

Can you try rerunning the importer with that and let me know if there are any error messages. In fact, just paste the full output here.

DylanFrese commented 8 years ago

The instance has 32GB of RAM (not much swap though).

I'll rerun with debug logs.

edit: I can also confirm that memory has never approached being completely filled (I have an alert that trips of the amount of usable memory is below 20%), and that the import does complete (it is not killed in the middle), just with quite a few nodes missing.

DylanFrese commented 8 years ago

Here's the log (with log level 'debug'), though it doesn't seem very helpful.

The node I mentioned is at line 5502. I confirmed that the administrative data is still missing (both by querying the Pelias API and by looking at Elasticsearch itself).

orangejulius commented 8 years ago

Okay, interesting. 32GB is definitely enough RAM :)

That debug log looked it was without admin lookup. Can you also share one with admin lookup turned on?

A couple things to check off the standard debugging checklist:

DylanFrese commented 8 years ago

The openstreetmap version is the tip of production (4be008ebf5dfa05b6b9ffd2661476d3c6b4d0694).

It's interesting that the log seems to indicate that admin lookup is disabled, because here is my pelias.json:

{
  "esclient": {
    "apiVersion": "1.7",
    "keepAlive": true,
    "requestTimeout": "120000",
    "hosts": [{
      "env": "development",
      "protocol": "http",
      "host": "elasticsearch",
      "port": 9200
    }],
    "log": [{
      "type": "stdio",
      "level": [ "error", "warning" ]
    }]
  },
  "dbclient": {
    "statFrequency": 10000
  },
  "api": {
    "accessLog": "common",
    "host": "localhost",
    "version": "1.0"
  },
  "logger": {
    "level": "debug",
    "timestamp": true,
    "colorize": true
  },
  "acceptance-tests": {
    "endpoints": {
      "local": "http://localhost:3100/v1/"
    }
  },
  "imports": {
    "geonames": {
      "datapath": "/data/geonames",
      "adminLookup": true
    },
    "openstreetmap": {
      "datapath": "/data/openstreetmap",
      "adminLookup": true,
      "leveldbpath": "/work",
      "import": [{
        "filename": "minneapolis-saint-paul_minnesota.osm.pbf"
      }]
    },
    "openaddresses": {
      "datapath": "/data/openaddresses/csv",
      "files": []
    },
    "whosonfirst": {
      "datapath": "/data/whosonfirst"
    }
  }
}

The NPM output:

npm info it worked if it ends with ok
npm info using npm@3.9.3
npm info using node@v4.2.2
pelias-openstreetmap@4.0.0 /code/openstreetmap
+-- async@1.5.2
+-- colors@1.1.2
+-- deep-diff@0.3.4
+-- event-stream@3.3.2
| +-- duplexer@0.1.1
| +-- from@0.1.3
| +-- map-stream@0.1.0
| +-- pause-stream@0.0.11
| +-- split@0.3.3
| +-- stream-combiner@0.0.4
| `-- through@2.3.8
+-- extend@2.0.1
+-- geolib@2.0.21
+-- gjtk@1.0.0-b
| +-- point-in-polygon@0.0.0
| `-- uri-js@1.4.2
+-- is-object@1.0.1
+-- iso-639-3@1.0.0
+-- istanbul@0.4.3
| +-- abbrev@1.0.7
| +-- escodegen@1.8.0
| | +-- estraverse@1.9.3
| | +-- esutils@2.0.2
| | +-- optionator@0.8.1
| | | +-- deep-is@0.1.3
| | | +-- fast-levenshtein@1.1.3
| | | +-- levn@0.3.0
| | | +-- prelude-ls@1.1.2
| | | `-- type-check@0.3.2
| | `-- source-map@0.2.0
| +-- esprima@2.7.2
| +-- fileset@0.2.1
| | +-- glob@5.0.15
| | `-- minimatch@2.0.10
| +-- handlebars@4.0.5
| | +-- optimist@0.6.1
| | | +-- minimist@0.0.10
| | | `-- wordwrap@0.0.3
| | +-- source-map@0.4.4
| | | `-- amdefine@1.0.0
| | `-- uglify-js@2.6.2
| |   +-- async@0.2.10
| |   +-- source-map@0.5.6
| |   +-- uglify-to-browserify@1.0.2
| |   `-- yargs@3.10.0
| |     +-- camelcase@1.2.1
| |     +-- cliui@2.1.0
| |     | +-- center-align@0.1.3
| |     | | +-- align-text@0.1.4
| |     | | | +-- kind-of@3.0.3
| |     | | | | `-- is-buffer@1.1.3
| |     | | | `-- longest@1.0.1
| |     | | `-- lazy-cache@1.0.4
| |     | +-- right-align@0.1.3
| |     | `-- wordwrap@0.0.2
| |     +-- decamelize@1.2.0
| |     `-- window-size@0.1.0
| +-- js-yaml@3.6.1
| | +-- argparse@1.0.7
| | | `-- sprintf-js@1.0.3
| | `-- esprima@2.7.2
| +-- mkdirp@0.5.1
| | `-- minimist@0.0.8
| +-- nopt@3.0.6
| +-- once@1.3.3
| | `-- wrappy@1.0.2
| +-- resolve@1.1.7
| +-- supports-color@3.1.2
| | `-- has-flag@1.0.0
| +-- which@1.2.9
| | `-- isexe@1.1.2
| `-- wordwrap@1.0.0
+-- jshint@2.9.2
| +-- cli@0.6.6
| | `-- glob@3.2.11
| |   `-- minimatch@0.3.0
| |     +-- lru-cache@2.7.3
| |     `-- sigmund@1.0.1
| +-- console-browserify@1.1.0
| | `-- date-now@0.1.4
| +-- exit@0.1.2
| +-- htmlparser2@3.8.3
| | +-- domelementtype@1.3.0
| | +-- domhandler@2.3.0
| | +-- domutils@1.5.1
| | | `-- dom-serializer@0.1.0
| | |   +-- domelementtype@1.1.3
| | |   `-- entities@1.1.1
| | +-- entities@1.0.0
| | `-- readable-stream@1.1.14
| |   `-- isarray@0.0.1
| +-- lodash@3.7.0
| +-- minimatch@2.0.10
| | `-- brace-expansion@1.1.4
| |   +-- balanced-match@0.4.1
| |   `-- concat-map@0.0.1
| +-- shelljs@0.3.0
| `-- strip-json-comments@1.0.4
+-- lodash@4.13.1
+-- merge@1.2.0
+-- naivedb@1.0.7
| `-- through2@0.6.5
|   `-- readable-stream@1.0.34
+-- pbf2json@3.0.0
| `-- through2@0.6.5
+-- pelias-address-deduplicator@1.1.0
| `-- request@2.72.0
|   +-- aws-sign2@0.6.0
|   +-- aws4@1.4.1
|   +-- bl@1.1.2
|   | `-- readable-stream@2.0.6
|   |   `-- isarray@1.0.0
|   +-- caseless@0.11.0
|   +-- combined-stream@1.0.5
|   | `-- delayed-stream@1.0.0
|   +-- extend@3.0.0
|   +-- forever-agent@0.6.1
|   +-- form-data@1.0.0-rc4
|   | `-- async@1.5.2
|   +-- har-validator@2.0.6
|   | +-- commander@2.9.0
|   | | `-- graceful-readlink@1.0.1
|   | +-- is-my-json-valid@2.13.1
|   | | `-- jsonpointer@2.0.0
|   | `-- pinkie-promise@2.0.1
|   |   `-- pinkie@2.0.4
|   +-- hawk@3.1.3
|   | +-- boom@2.10.1
|   | +-- cryptiles@2.0.5
|   | +-- hoek@2.16.3
|   | `-- sntp@1.0.9
|   +-- http-signature@1.1.1
|   | +-- assert-plus@0.2.0
|   | +-- jsprim@1.2.2
|   | | +-- extsprintf@1.0.2
|   | | +-- json-schema@0.2.2
|   | | `-- verror@1.3.6
|   | `-- sshpk@1.8.3
|   |   +-- asn1@0.2.3
|   |   +-- assert-plus@1.0.0
|   |   +-- dashdash@1.13.1
|   |   | `-- assert-plus@1.0.0
|   |   +-- ecc-jsbn@0.1.1
|   |   +-- getpass@0.1.6
|   |   | `-- assert-plus@1.0.0
|   |   +-- jodid25519@1.0.2
|   |   +-- jsbn@0.1.0
|   |   `-- tweetnacl@0.13.3
|   +-- is-typedarray@1.0.0
|   +-- isstream@0.1.2
|   +-- json-stringify-safe@5.0.1
|   +-- mime-types@2.1.11
|   | `-- mime-db@1.23.0
|   +-- node-uuid@1.4.7
|   +-- oauth-sign@0.8.2
|   +-- qs@6.1.0
|   +-- stringstream@0.0.5
|   +-- tough-cookie@2.2.2
|   `-- tunnel-agent@0.4.3
+-- pelias-config@1.0.3
| `-- mergeable@0.0.0
|   `-- extend@1.3.0
+-- pelias-dbclient@1.0.2
| +-- byline@4.2.1
| +-- elasticsearch@11.0.1
| | +-- lodash@3.10.1
| | +-- lodash-compat@3.10.2
| | `-- promise@7.1.1
| |   `-- asap@2.0.4
| +-- openstreetmap-stream@0.0.6
| | +-- stream-combiner2@1.1.1
| | | +-- duplexer2@0.1.4
| | | `-- readable-stream@2.1.4
| | |   `-- isarray@1.0.0
| | `-- through2@0.5.1
| |   `-- xtend@3.0.0
| `-- osm-pbf-parser@2.3.0
|   +-- brfs@1.4.3
|   | +-- quote-stream@1.0.2
|   | | `-- buffer-equal@0.0.1
|   | `-- static-module@1.3.1
|   |   +-- concat-stream@1.4.10
|   |   | +-- readable-stream@1.1.14
|   |   | `-- typedarray@0.0.6
|   |   +-- duplexer2@0.0.2
|   |   | `-- readable-stream@1.1.14
|   |   +-- escodegen@1.3.3
|   |   | +-- esprima@1.1.1
|   |   | +-- estraverse@1.5.1
|   |   | +-- esutils@1.0.0
|   |   | `-- source-map@0.1.43
|   |   +-- falafel@1.2.0
|   |   | `-- acorn@1.2.2
|   |   +-- object-inspect@0.4.0
|   |   +-- quote-stream@0.0.0
|   |   | `-- minimist@0.0.8
|   |   +-- shallow-copy@0.0.1
|   |   +-- static-eval@0.2.4
|   |   | `-- escodegen@0.0.28
|   |   |   +-- esprima@1.0.4
|   |   |   `-- estraverse@1.3.2
|   |   `-- through2@0.4.2
|   |     `-- xtend@2.1.2
|   |       `-- object-keys@0.4.0
|   +-- protocol-buffers@3.1.6
|   | +-- generate-function@2.0.0
|   | +-- generate-object-property@1.2.0
|   | | `-- is-property@1.0.2
|   | +-- protocol-buffers-schema@2.2.0
|   | +-- signed-varint@2.0.0
|   | | `-- varint@3.0.1
|   | `-- varint@4.0.0
|   `-- readable-stream@2.1.4
|     +-- buffer-shims@1.0.0
|     `-- isarray@1.0.0
+-- pelias-logger@0.0.8
| `-- winston@0.9.0
|   +-- UNMET DEPENDENCY async@0.9.x
|   +-- UNMET DEPENDENCY colors@1.0.x
|   +-- cycle@1.0.3
|   +-- eyes@0.1.8
|   +-- pkginfo@0.3.1
|   `-- stack-trace@0.0.9
+-- pelias-model@4.0.0
+-- pelias-wof-admin-lookup@2.0.0
| +-- pelias-parallel-stream@0.0.2
| `-- pelias-wof-pip-service@1.4.1
|   +-- async@1.5.2
|   +-- csv-parse@1.1.0
|   +-- express@4.13.4
|   | +-- accepts@1.2.13
|   | | `-- negotiator@0.5.3
|   | +-- array-flatten@1.1.1
|   | +-- content-disposition@0.5.1
|   | +-- content-type@1.0.2
|   | +-- cookie@0.1.5
|   | +-- cookie-signature@1.0.6
|   | +-- debug@2.2.0
|   | | `-- ms@0.7.1
|   | +-- depd@1.1.0
|   | +-- escape-html@1.0.3
|   | +-- etag@1.7.0
|   | +-- finalhandler@0.4.1
|   | | `-- unpipe@1.0.0
|   | +-- fresh@0.3.0
|   | +-- merge-descriptors@1.0.1
|   | +-- methods@1.1.2
|   | +-- on-finished@2.3.0
|   | | `-- ee-first@1.1.1
|   | +-- parseurl@1.3.1
|   | +-- path-to-regexp@0.1.7
|   | +-- proxy-addr@1.0.10
|   | | +-- forwarded@0.1.0
|   | | `-- ipaddr.js@1.0.5
|   | +-- qs@4.0.0
|   | +-- range-parser@1.0.3
|   | +-- send@0.13.1
|   | | +-- destroy@1.0.4
|   | | +-- http-errors@1.3.1
|   | | +-- mime@1.3.4
|   | | `-- statuses@1.2.1
|   | +-- serve-static@1.10.3
|   | | `-- send@0.13.2
|   | +-- type-is@1.6.13
|   | | `-- media-typer@0.3.0
|   | +-- utils-merge@1.0.0
|   | `-- vary@1.0.1
|   +-- fs-extra@0.30.0
|   | +-- graceful-fs@4.1.4
|   | +-- jsonfile@2.3.1
|   | +-- klaw@1.2.0
|   | `-- rimraf@2.5.2
|   +-- microtime@2.1.1
|   | +-- bindings@1.2.1
|   | `-- nan@2.3.5
|   +-- polygon-lookup@1.0.2
|   | +-- rbush@1.3.4
|   | `-- through2@0.6.5
|   +-- simplify-js@1.2.1
|   +-- tar-stream@1.5.2
|   | +-- end-of-stream@1.1.0
|   | `-- readable-stream@2.1.4
|   |   `-- isarray@1.0.0
|   +-- through2-filter@2.0.0
|   +-- through2-map@2.0.0
|   `-- unbzip2-stream@1.0.9
|     `-- buffer@3.6.0
|       +-- base64-js@0.0.8
|       +-- ieee754@1.1.6
|       `-- isarray@1.0.0
+-- precommit-hook@3.0.0
| `-- git-validate@2.1.4
+-- taginfo@1.0.1
+-- tap-spec@4.1.1
| +-- chalk@1.1.3
| | +-- ansi-styles@2.2.1
| | +-- escape-string-regexp@1.0.5
| | +-- has-ansi@2.0.0
| | | `-- ansi-regex@2.0.0
| | +-- strip-ansi@3.0.1
| | `-- supports-color@2.0.0
| +-- figures@1.7.0
| | `-- object-assign@4.1.0
| +-- lodash@3.10.1
| +-- pretty-ms@2.1.0
| | +-- is-finite@1.0.1
| | | `-- number-is-nan@1.0.0
| | +-- parse-ms@1.0.1
| | `-- plur@1.0.0
| +-- repeat-string@1.5.4
| `-- tap-out@1.4.2
|   +-- re-emitter@1.1.3
|   +-- readable-stream@2.1.4
|   | `-- isarray@1.0.0
|   +-- split@1.0.0
|   `-- trim@0.0.1
+-- tape@4.5.1
| +-- deep-equal@1.0.1
| +-- defined@1.0.0
| +-- function-bind@1.1.0
| +-- glob@7.0.3
| | +-- inflight@1.0.5
| | +-- minimatch@3.0.0
| | `-- path-is-absolute@1.0.0
| +-- has@1.0.1
| +-- inherits@2.0.1
| +-- minimist@1.2.0
| +-- object-inspect@1.1.0
| +-- resumer@0.0.0
| `-- string.prototype.trim@1.1.2
|   +-- define-properties@1.1.2
|   | +-- foreach@2.0.5
|   | `-- object-keys@1.0.9
|   `-- es-abstract@1.5.1
|     +-- es-to-primitive@1.1.1
|     | +-- is-date-object@1.0.1
|     | `-- is-symbol@1.0.1
|     +-- is-callable@1.1.3
|     `-- is-regex@1.0.3
+-- through2@2.0.1
| +-- readable-stream@2.0.6
| | +-- core-util-is@1.0.2
| | +-- isarray@1.0.0
| | +-- process-nextick-args@1.0.7
| | +-- string_decoder@0.10.31
| | `-- util-deprecate@1.0.2
| `-- xtend@4.0.1
+-- through2-sink@1.0.0
| +-- through2@0.5.1
| `-- xtend@3.0.0
+-- through2-spy@2.0.0
+-- tmp@0.0.26
| `-- os-tmpdir@1.0.1
`-- trimmer@1.0.0

npm ERR! missing: async@0.9.x, required by winston@0.9.0
npm ERR! missing: colors@1.0.x, required by winston@0.9.0
orangejulius commented 8 years ago

Yeah that is very interesting. Your pelias.config looks right, although JSON can hide a lot of mistakes when it comes to config files, so I'm not willing to say with 100% certainty.

Here's the log output when I run that same import with admin lookup turned on:

import_log.txt

Note that there's a lot of output at the beginning about loading all the Who's on First data which is absent in your output.

Make sure you ran rm -rf node_modules && npm install. Like rebooting a Windows computer, it solves way more problems than it should and no one knows why.

The next thing I'm thinking of is the Who's on First data, which is required for admin lookup. It should throw errors if the data is missing or not found, but I'm sure there are ways it can silently fail. You changed the whosonfirst datapath so I assume somethings there. Is it the whosonfirst git repo?

I'm happy to hop on a video chat via google hangouts or something similar if you want to further debug this over a better medium that github comments. It's definitely worth our time here on the pelias team to make sure no one runs into issues like this. We also have a gitter chat room if that's better for you.

DylanFrese commented 8 years ago

I'd be happy to hop on Gitter/IRC/etc., though I'm unable to do a call at the moment.