neo4j-contrib / spatial

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest. In addition classes are provided to expose the data to geotools and thereby to geotools enabled applications like geoserver and uDig.
http://neo4j-contrib.github.io/spatial
Other
780 stars 192 forks source link

osmimporter not creating R-Tree index for points/linestrings for osm files downloaded from geofabrik (montenegro+scotland) #347

Closed 100beans closed 6 years ago

100beans commented 6 years ago

At the very end of the import I receive the below error after re-indexing occurs:

"Index[0]: info | Re-indexing elapsed time in seconds: 0.033 Geometry statistics for 0 geometry types:"

thus can't run any spatial procedures as no geometries are indexed

However, earlier on in the import it seems to recognise my geometry types as it prints "Geometry statistics for 3 geometry types: Point: 30819 LineString: 39776 Polygon: 111083"

Why does it forget all the geometry types when it re-indexes?

craigtaverner commented 6 years ago

Could you describe which version of Neo4j and Neo4j Spatial you are using, and perhaps attach, or provide a URI for an example OSM file that has this problem, so I can try reproduce it?

Kubera2017 commented 6 years ago

This problem is not only with geofabric. I have the next with https://download.bbbike.org/osm/bbbike/Cracow/ Jul 29 18:17:49 workstation neo4j[26471]: Sun Jul 29 18:17:49 SAMT 2018: Saving relation 312 #011(3.0384480542246113 relation/second) Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: Completed load in 103.012s Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: #011Imported nodes: 47.566s Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: #011Optimized index: 0.0s Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: #011Imported ways: 55.033s Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: #011Optimized index: 0.0s Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: #011Imported rels: 0.413s Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: When processing the relations, there were 50257 missing members Jul 29 18:17:49 workstation neo4j[26471]: Thu Jan 01 04:00:00 SAMT 1970: Found 2151343 nodes during 0s way creation: Jul 29 18:17:49 workstation neo4j[26471]: #011node-index: #0112151343/55s #011(39092.582497456024 nodes/second) Jul 29 18:17:49 workstation neo4j[26471]: Loaded 1793127 nodes Jul 29 18:17:49 workstation neo4j[26471]: Loaded 255665 ways Jul 29 18:17:49 workstation neo4j[26471]: Loaded 3363 relations Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: info | Elapsed time in seconds: 103.012 Jul 29 18:17:49 workstation neo4j[26471]: Geometry statistics for 2 geometry types: Jul 29 18:17:49 workstation neo4j[26471]: #011Point: 129680 Jul 29 18:17:49 workstation neo4j[26471]: #011LineString: 255565 Jul 29 18:17:49 workstation neo4j[26471]: Tag statistics for 4 types: Jul 29 18:17:49 workstation neo4j[26471]: #011all: TagStats[all]: [...] Jul 29 18:17:49 workstation neo4j[26471]: #011node: TagStats[node]: [...] Jul 29 18:17:49 workstation neo4j[26471]: #011way: TagStats[way]: [...] Jul 29 18:17:49 workstation neo4j[26471]: #011relation: TagStats[relation]: [...] Jul 29 18:17:49 workstation neo4j[26471]: /data/alex/cra.osm[16592204]: Re-indexing with GraphDatabaseService: ProcedureGraphDatabaseService [/data/neo4j/databases/osm.db] (class: class org.neo4j.kernel.impl.factory.GraphDatabaseFacade)

But in shell I see: neo4j> CALL spatial.addLayer('layerOSM', 'osm', ''); ({index_class: "org.neo4j.gis.spatial.index.LayerRTreeIndex", ctime: 1532873732447, geomencoder: "org.neo4j.gis.spatial.osm.OSMGeometryEncoder", layer_class: "org.neo4j.gis.spatial.osm.OSMLayer", layer: "layerOSM"}) | 1 row available after 35 ms, consumed after another 0 ms neo4j> CALL spatial.importOSMToLayer('layerOSM', '/data/alex/cra.osm'); Failed to invoke procedure spatial.importOSMToLayer: Caused by: java.lang.NullPointerException neo4j>

With geofabrik files (I've got https://download.geofabrik.de/europe/poland/malopolskie.html and split it to smaller parts): neo4j> CALL spatial.addLayer('layerOSM2', 'osm', ''); | ({index_class: "org.neo4j.gis.spatial.index.LayerRTreeIndex", ctime: 1532874485960, geomencoder: "org.neo4j.gis.spatial.osm.OSMGeometryEncoder", layer_class: "org.neo4j.gis.spatial.osm.OSMLayer", layer: "layerOSM2"}) | 1 row available after 28 ms, consumed after another 0 ms neo4j> CALL spatial.importOSMToLayer('layerOSM2', '/data/alex/output12.osm'); +-------+ | count | +-------+ | 0 | +-------+ 1 row available after 9289 ms, consumed after another 1 ms

And logs: Jul 29 18:28:32 workstation neo4j[26471]: Sun Jul 29 18:28:32 SAMT 2018: Saving node 273604 #011(31821.819027680856 node/second) Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: Completed load in 9.268s Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: #011Imported nodes: -1.532874503519E9s Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: #011Optimized index: 0.0s Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: #011Imported ways: 0.0s Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: #011Optimized index: 0.0s Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: #011Imported rels: 1.532874512787E9s Jul 29 18:28:32 workstation neo4j[26471]: Thu Jan 01 04:00:00 SAMT 1970: Found 0 nodes during 0s way creation: Jul 29 18:28:32 workstation neo4j[26471]: Loaded 296710 nodes Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: info | Elapsed time in seconds: 9.268 Jul 29 18:28:32 workstation neo4j[26471]: Geometry statistics for 1 geometry types: Jul 29 18:28:32 workstation neo4j[26471]: #011Point: 8510 Jul 29 18:28:32 workstation neo4j[26471]: Tag statistics for 2 types: Jul 29 18:28:32 workstation neo4j[26471]: #011all: TagStats[all]: [..] Jul 29 18:28:32 workstation neo4j[26471]: #011node: TagStats[node]: [..] Jul 29 18:28:32 workstation neo4j[26471]: /data/alex/output12.osm[993951]: Re-indexing with GraphDatabaseService: ProcedureGraphDatabaseService [/data/neo4j/databases/osm.db] (class: class org.neo4j.kernel.impl.factory.GraphDatabaseFacade) Jul 29 18:28:32 workstation neo4j[26471]: Index[0]: info | Re-indexing elapsed time in seconds: 0.003 Jul 29 18:28:32 workstation neo4j[26471]: Geometry statistics for 0 geometry types:

I use neo4j 3.4.1 with spatial 0.25.5.

Kubera2017 commented 6 years ago

@craigtaverner pls don't take into account my previous logs - I don't know that your procedure adds only nodes attached to ways. I tried to add just a list of single nodes. But, please look at @100beans post, I faced with the problem too. I download file from geofabrik https://download.geofabrik.de/europe/poland/malopolskie.html and use maven to make a db: java -cp target/classes:target/dependency/* org.neo4j.gis.spatial.osm.OSMImporter new-osm-db malopolskie-latest.osm And got: malopolskie-latest.osm[98500842]: Completed load in 1592.819s malopolskie-latest.osm[98500842]: Imported nodes: 756.139s malopolskie-latest.osm[98500842]: Optimized index: 0.0s malopolskie-latest.osm[98500842]: Imported ways: 829.179s malopolskie-latest.osm[98500842]: Optimized index: 0.0s malopolskie-latest.osm[98500842]: Imported rels: 7.501s malopolskie-latest.osm[98500842]: When processing the relations, there were 189606 missing members Thu Jan 01 04:00:00 SAMT 1970: Found 12937072 nodes during 0s way creation: changeset: 12937072/816s (15836.981723365447 nodes/second) Loaded 11296720 nodes Loaded 1411075 ways Loaded 10596 relations malopolskie-latest.osm[98500842]: info | Elapsed time in seconds: 1592.82 Geometry statistics for 3 geometry types: Point: 605917 LineString: 319703 Polygon: 1091372 Tag statistics for 4 types: ... malopolskie-latest.osm[98500842]: Re-indexing with GraphDatabaseService: community single [/data/alex/spatial/new-osm-db] (class: class org.neo4j.kernel.impl.factory.GraphDatabaseFacade) Index[0]: info | Re-indexing elapsed time in seconds: 0.056 Geometry statistics for 0 geometry types: === Completed loading malopolskie-latest.osm in 1600.435 seconds === ...Spatial queries doesn't work.

I've investigated that if I remove ' changeset="0" ' by replacing it to '', reindexing starts, but I got NullPointerException on line 225 of OSMImporter.java: for ( Node proxy : findNodes.traverse( way ).nodes() ) { Node node = proxy.getSingleRelationship( OSMRelation.NODE, Direction.OUTGOING ).getEndNode(); stats.addGeomStats( layer.addWay( node, true ) ); }

Note, that file from geofabrik comes with changeset="0" for each element.

craigtaverner commented 6 years ago

I noticed that we fixed a bug a few months ago related to OSM files without changeset attributes. And one of the parts of that bug-fix is that previously we relied on a full changeset->way->node path to find things to index, but if the changesets were missing, we needed to look instead for way->node patterns to index. The bug you are reporting here seems to relate to that second option. So my suspicion is that this new bug is strongly related to the old one. Either the previous bug-fix did not cover sufficient cases, or it in fact uncovered a new bug. I need to find some time to test this out on the data files you are testing with. Hopefully in the next two weeks.

You mention above that you get a NPE on line 225, but the code you show looks more like line 275? See https://github.com/neo4j-contrib/spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java#L275

Could you confirm. I noticed that that code is only activated if two conditions are true:

craigtaverner commented 6 years ago

One possible NPE on line 275 would be that getSingleRelationship can return null, and then calling getEndNode() will cause the NPE. However the OSM data model requires all proxy nodes to have these relationships, so either the data model is broken (missing NODE relationships) or the findNodes traversal is finding nodes that are not proxies. Looking at the code I see nothing obviously wrong with it, so again I would need to find the time to actually debug this (cannot do it now, because I'm on a computer without the dev-environment necessary). If I have given you enough hints to fix it yourself, feel free to provide a PR and we can get it merged. Otherwise you'll need to wait for me to get the chance to fix it.

Kubera2017 commented 6 years ago

Thanks for the answer. But all files from geofabrik have changeset="0" for every node, way and relationship. And when it was imported via spatial.importOSMToLayer, it returned "count"=0 and spatial queries don't work (as @100beans describes). Removing of the changeset is my experiments on these files.

Kubera2017 commented 6 years ago

"One possible NPE on line 275 would be that getSingleRelationship can return null, and then calling getEndNode() will cause the NPE. However the OSM data model requires all proxy nodes to have these relationships, so either the data model is broken (missing NODE relationships) or the findNodes traversal is finding nodes that are not proxies." Maybe it relies on: "malopolskie-latest.osm[98500842]: When processing the relations, there were 189606 missing members" But it happens with files from https://download.bbbike.org/osm/bbbike/Cracow/ too (NPE, see logs above). So it brakes all usefullness of the plugin, need to find a solution.

craigtaverner commented 6 years ago

@Kubera2017 - I've tested with the cracow data, and notice two issues:

What I notice here is that files with changeset attributes follow a different code path than files without. And the former code path does not support includePoints while the latter does. This is certainly a bug. To reduce the scope of the bug I've made two changes:

I will try make a new release with this fix in it soon, for 3.4.x only. Hopefully everyone has upgraded to that version.

craigtaverner commented 6 years ago

I've made a bugfix release at https://github.com/neo4j-contrib/spatial/releases/tag/0.25-neo4j-3.4.5

This fixes the main issue, I believe. Can you test it and see if it is sufficient for your solution, and if not, re-open this issue with new details, or make a new issue.