neo4j-contrib / spatial

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest. In addition classes are provided to expose the data to geotools and thereby to geotools enabled applications like geoserver and uDig.
http://neo4j-contrib.github.io/spatial
Other
777 stars 192 forks source link

Query: SpatialDatabaseService thread-safety. #219

Open fragmadata opened 8 years ago

fragmadata commented 8 years ago

Hi,

This is a query. While using an embedded neo4j setup in a Java server, can I use a single instance of SpatialDatabaseService across multiple threads? More like, can I have an instance of SpatialDatabaseService as a spring bean autowired into other dependencies, rather than creating : new SpatialDatabaseService(db), in a method scope everytime I require?

Regards.

craigtaverner commented 8 years ago

Not all of the spatial library is designed to be thread safe, and the library is not tested for thread safety. In addition it is suspected that there might be concurrency issues with the RTree. This means that even if you make multiple copies of SpatialDatabaseService, it might happen that if they refer to the same layer (index), then concurrent updates to the index could cause problems. However, since each layer maintains a separate RTree, you can concurrently access multiple layers.

Whether or not you will experience issues depends on the nature of your application, and the level of concurrency likely to be experienced. Since the library was originally designed for GIS modelling use in the original embedded Neo4j, concurrency was not considered much in the design. Feel free to try it out, and report any concurrency issues you experience. While we are not currently planning work in that area, if we have failing test cases we might be able to take the time to fix them.

Alternatively, we are working on a new spatial feature within Neo4j itself, which will support concurrency and scalability. It will only support a very small subset of the features of the original library, but at higher reliability and stability. Perhaps you would write to the current library and port to the new system when it becomes available, if you experience issues with concurrency.

phil20686 commented 8 years ago

I can confirm with certainty that the RTree is not thread safe. It is possible to simultaneously trigger two rebalancing which conflict as they propagate up the tree. It is thread safe for multiple reads though, as long as you do not plan to modify the tree. Its easiest for this just to have multiple spatial database services though.

craigtaverner commented 8 years ago

Thanks Phil. I think it was your earlier discovery I was remembering when I suggested that the RTree was not concurrency safe. We did discuss locking the RTree root node as a way to make it safe, but that work was never done (along with other interesting work on the RTree).

I do not believe, however, that the issue would be solved by having multiple instances of the SpatialDatabaseService. If they still refer to the same RTree, there will still be a concurrency issue. To fix this we need:

1- Good failing test case, and good test coverage of concurrency 2- Thread safe RTree code and SpatialDatabaseService code (might be straightforward, or already thread safe) 3- Thread safe rebalancing, either through locking the RTree root node, or some clever algorithm changes.

fragmadata commented 8 years ago

@craigtaverner and @phil20686 , Thanks a lot for responding. This is quite helpful. Regards.

phil20686 commented 8 years ago

Yes. But if you are not modifying the tree, and only reading from it, you are fine to use one Spatial Database Service per thread. The RTree is read-safe if you do not modify it, and that is much easier than trying to pass one SDS among lots of threads, at least in embedded.

I did use a singleton bean to control access to the RTree in a web application using a single SDS inside an EJB framework but it requires special care to route all your spatial queries through it without impacting read performance.

ehx-v1 commented 8 years ago

btw I somehow have the impression the problem could be easiest solved by directly handling the rebalancing conflict right where it appears cannot be sure though, didn't try yet, but from how you described the problem I could imagine this the best approach