pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

errors while importing berlin_germany.osm.pbf (Mapzen extract) #215

Closed oliverbienert closed 6 years ago

oliverbienert commented 7 years ago

Sorry if cross-posting here, I thought this would be addressed better here than in whosonfirst rep.

I successfully downloaded and imported whosonfirst data by:

npm run download -- --admin-only
npm start

When importing osm with "adminLookup": true , however, I see some errors:

2016-12-14T22:19:52.367Z - error: [wof-pip-service:loadJSON] exception occured parsing /home/pelias/data/whosonfirst//data/110/869/4417/1108694417.geojson: Error: ENOENT: no such file or directory, open '/home/pelias/data/whosonfirst//data/110/869/4417/1108694417.geojson'

This is just an example line, the error appears for a huge number of apparently wrong pathes. What I observed is that the above path in deed does not exist, but an almost identical path with the last sub directory 4417 splitted in 2 sub directories. So the following path exists:

/home/pelias/data/whosonfirst//data/110/869/441/7/1108694417.geojson

so the file exists actually.

Nevertheless, OSM data gets apparently imported. Shall I ignore the error?

orangejulius commented 7 years ago

Hey @oliverbienert, Thanks for the report. The issue loading the files is tracked in https://github.com/pelias/pelias/issues/477 and will be fixed soon. Until then it isn't much of a concern, it just means a little data is missing.

The other error which was in your comment initially I suspect is related to Node 7. We haven't tested it much, and only officially support Node 4 and 6. Since you removed it I'm wondering if you fixed it, and would love to know how either way, even if it was "user error" (we're always looking to make things easier to use). If you haven't fixed it, try downgrading to Node 6 and let us know how it goes.

oliverbienert commented 7 years ago

Thank you for answering. I removed the second part of my comment because it also happened on my local 32bit box only. But yes, the difference may indeed be node 7 vs node 6. Sorry for removing it. I initially planned to move it to a new issue but then forgot about when I got it running on the test instance. I'll let you know if downgrading node improve things!

oliverbienert commented 7 years ago

Unfortunately, downgrading does not solve the issue. The following error appears immediately after firing npm start:

➜  openstreetmap git:(production) npm start

> pelias-openstreetmap@0.0.0-semantic-release start /home/oliver/pelias/openstreetmap
> node index.js

2016-12-15T16:26:44.065Z - info: [openstreetmap] Creating read stream for: /home/oliver/data/osm/berlin_germany.osm.pbf
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: spawn /home/oliver/pelias/openstreetmap/node_modules/pbf2json/build/pbf2json.linux-ia32 ENOENT
    at exports._errnoException (util.js:1022:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:193:32)
    at onErrorNT (internal/child_process.js:359:16)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)
    at Module.runMain (module.js:606:11)
    at run (bootstrap_node.js:394:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:509:3

node 6.9.2 npm 3.10.9

So, this error is probably the reason for the following one's:

Error: channel closed

Will be home in a couple of hours and can look into again then.

orangejulius commented 7 years ago

Interesting. this may be another 32 bit issue. Our OSM importer uses https://github.com/pelias/pbf2json internally, and I doubt anyone has tried that on 32 bit systems in a long time. Could you check out that project and see if the unit tests pass for you?

oliverbienert commented 7 years ago

Well, the tests don't pass:

  tag_mapper: errors - newline in value

    ✔ remove newlines
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: spawn /home/oliver/pelias/openstreetmap/node_modules/pbf2json/build/pbf2json.linux-ia32 ENOENT
    at exports._errnoException (util.js:1022:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:193:32)
    at onErrorNT (internal/child_process.js:359:16)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

By the way, can you elaborate a bit on how importing works? The first time I read the docs, it was not clear to me that all the administrative hierarchy data comes from Whosonfirst. So do I have to import these? Or is just downloading whosonfirst enough to import OSM with adminLookup set true? Because when I try to import downloaded whosonfirst data, I get a javascript heap out of memory error. I tried node --max_old_space_size=2000 /usr/local/bin/npm start, but to no avail. Again, this is 32bit, just 4GB RAM, Ubuntu 16.04. I don't have to run it here, so perhaps you shouldn't investigate too far in a dying breed.

orangejulius commented 7 years ago

Yeah, probably 32 bit issues that we can possibly fix but are honestly not a priority, so it would take a while.

As for how the importing works, that's a great question that we get relatively frequently, so we need to update the docs to reflect it. Each of our importers operates independent of the data that is already in Elasticsearch. So you can, for example, import OSM data without first having imported WOF data. However, when turning on admin lookup, the WOF data must be on disk, as it's used during the import process to enrich the data with admin information. I hope that makes sense, and I'd love it if you could point us to places in the docs that were confusing in that regard, so we can make sure we fix them (I have some places in mind to change but want to make sure we get them all).

oliverbienert commented 7 years ago

Sorry for being late to answer you. Yes that makes sense. As for the documentation. I am going to skim through the relevant documentation (in the next week probably) and let you know where I would like some clarification.

missinglink commented 7 years ago

it's actually pretty easy to cross compile pbf2json for 32 bit machines, more info here: https://github.com/pelias/pbf2json#compile-source-for-all-supported-architecture