neo4j-contrib / spatial

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest. In addition classes are provided to expose the data to geotools and thereby to geotools enabled applications like geoserver and uDig.
http://neo4j-contrib.github.io/spatial
Other
780 stars 192 forks source link

Deadlock on parallel call to spatial.addNodes #355

Open cormander opened 5 years ago

cormander commented 5 years ago

If two separate queries that end up doing an addNodes call, I get this:

Neo4jError: Failed to invoke procedurespatial.addNodes: Caused by: org.neo4j.kernel.DeadlockDetectedException: ForsetiClient[1] can't acquire ExclusiveLock{owner=ForsetiClient[2]} on NODE(200100), because holders of that lock are waiting for ForsetiClient[1]. Wait list:ExclusiveLock[Client[2] waits for [1]]

That's the node that has the RTREE_METADATA to the layer node:

neo4j> match (a) where ID(a) = 200100 return a;
+----------------------------------------------------------+
| a                                                        |
+----------------------------------------------------------+
| ({maxNodeReferences: 100, totalGeometryCount: 47045060}) |
+----------------------------------------------------------+

I altered the query to hold a lock the spatial_root ReferenceNode;

match (a:ReferenceNode {name:"spatial_root"}) with collect(a) as lock call apoc.lock.nodes(lock) ... call spatial.addNodes ...

I see the deadlock exception much less as a result, but still see it sometimes. Parallel processing is important for doing very large imports into the graph.

Any thoughts? Thanks,

craigtaverner commented 5 years ago

The original spatial library was written within the context of low concurrency embedded applications. This means that several parts including the RTree are not thread safe. It is not recommended to run parallel bulk imports into the RTree.

The particular issue you are seeing is likely related to the way the total counts are maintained, which is not a good design and something we would like to fix, but even once fixed, the overall lack of thread safety in the RTree will remain and the risks with parallel imports remains, and would need to be addressed.

If you are only importing Point data, you could use a different index, hilbert curve or geohash over lucene. However lucene is known to perform badly for concurrent reads and writes, so you could face a different set of performance problems, depending on your usage scenario. If you work with points, the best option by far would be to use Neo4j's built-in spatial index only, and avoid this libraries indexing.

If you have points in one layer and complex geometries like polygons in another, you could actually use the native Neo4j point index for the points, and the spatial library for the polygons. The main consequence would be that you would have two quite different spatial models in place, but it could be an option to avoid the concurrency problems if the high volume data are the points.

cormander commented 5 years ago

Hi Craig!

I do already use the native point index, and also the brand new NativePointEncoder to reference them with complex polygons.

So what I’m hearing is, parallel execution is unsupported and can’t be?

Any thoughts on why things didn’t get complete solved by holding the apoc lock in the spatial root? Perhaps that doesn’t behave quite like I expect?

Perhaps related;

When I moved my Neo4j storage from a traditional HDD to a M.2 SSD (so the write speed increased by a factor of about 10x) I noticed the startup on my app on a fresh database started to sometimes fail on the second call to spatial.addLayerWithEncoder. They happen one after the other, not at the same time, and my “solution” was to add a sleep of one second in between them. What happened after the error was there was more than one ReferenceNode with name “spatial_root”.

Perhaps there’s something not holding a file system lock properly?