Support for neo4j's new storable.PointValue

neo4j-contrib / spatial

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest. In addition classes are provided to expose the data to geotools and thereby to geotools enabled applications like geoserver and uDig.

http://neo4j-contrib.github.io/spatial

Other

780 stars 192 forks source link

Support for neo4j's new storable.PointValue #353

Closed cormander closed 5 years ago

cormander commented 5 years ago

I use the point({longitude: X, latitude: Y}) on nodes in my database for the native neo4j distance function. However, in order to also use spatial for the RTREE indexing, I also have to store the same info in the point property as a string property. Additionally, the SimplePointEncoder stores the same info yet again redundently as a bbox property on the node. So, I end up with the same point info 4x on the node (the bbox has it twice).

I can use a point() property as the second argument to a call to spatial.intersects(), but can't seem to use it for the spatial layer itself. When I try to addNode to the layer, I get:

Failed to invoke procedure spatial.addNode: Caused by: java.lang.ClassCastException: org.neo4j.values.storable.PointValue cannot be cast to java.lang.String

Supporting storable.PointValue in this spatial plugin will reduce the property size requirements of the graph, and perhaps increase performance as well.

craigtaverner commented 5 years ago

I think it would be a good idea to have a new encoder that understands the new point types. However, I also do not see why you need to store the point more than once. If you store it as a point type, then use the built in spatial index (not the RTree) to index it. That will perform much better than the RTree.

If you want WKT or two double values to store the point, then only use one of them, and don't use the native points (or the native index).

To summarize:

Native points should not use this library at all, should use the built-in spatial index
WKT points (strings) can use this library with the RTree using the default encoder (and default layer)
SimplePoint layer will use two double values for lat:lon

Just use one approach, not all three.

If you really want to use an in-graph RTree index with the new native points, it would be easy to create a new GeometryEncoder that can do that for you (map Neo4j point types to JTS Point types and back). But I suspect you do not actually need this.

cormander commented 5 years ago

Hey @craigtaverner I appreciate the fast response!

I think I might actually need this, unless I'm missing something. While I use the native point for distance queries, I also have WTK polygons I need to see which points are in, so I use call spatial.intersects("address", wkt) as well. It is my understanding that RTREE is needed for good query performance on a polygon -- unless there's a native function that can do this? I didn't see one.

I have a graph of ~ 200 million addresses. Here's my current spatial layers:

neo4j> call spatial.layers();
+---------------------------------------------------------------------------------------------------------------------+
| name        | signature                                                                                             |
+---------------------------------------------------------------------------------------------------------------------+
| "address"   | "EditableLayer(name='address', encoder=SimplePointEncoder(x='longitude', y='latitude', bbox='bbox'))" |
| "turf"      | "EditableLayer(name='turf', encoder=WKTGeometryEncoder(geom='wkt', bbox='bbox'))"                     |
+---------------------------------------------------------------------------------------------------------------------+

I suppose I could store the address layer as a WKT layer, but it'll still need a string property (the geom) and it'll create a bbox property. My goal is to only have the native point() property to save space, especially on such a large graph, so I like the idea of a new GeometryEncoder.

I have the "turf" as a separate layer due to it having vastly fewer nodes and I use it for other operations unrelated to the address nodes, and it's much faster if it's separate. But I do store the polygons I use for the "address" layer there too.

By the way, the open source project I'm using neo4j-spatial for is here; https://github.com/OurVoiceUSA/

craigtaverner commented 5 years ago

I see what you mean. The native functionality does not really support polygons. If you want to index the polygons, then you are right that the best way is to use the RTree. However, if you want to use the polygons for searching for points (point-in-polygon searches), then you do not need to index the polygons using a library as extensive as this one, but could do with a simple point-in-polygon function on top of the native spatial index in Neo4j. I describe this scenario in my GraphConnect talk at https://neo4j.com/graphconnect-2018/session/neo4j-spatial-mapping, or the meetup talk I gave in November at https://www.youtube.com/watch?v=NS4NfkRql40.

Regarding your current model, it looks like the addresses are always points, so could either be stored in a SimplePointLayer indexed with an RTree, GeoHash or Hilbert Curve, or they could be native points indexed using the built-in spatial index in Neo4j (which uses the same hilbert curve logic as the one in this library, but is better optimized so should perform faster). The turf layer could be WKT in a DefaultLayer (indexed with RTree only, since non-points are hard to index in geohash or hilbert curves). However, if the polygons are simple polygons with no holes, you could also store them natively in neo4j as Point[] types, but these are only indexed for equality, not range, so there is no point in indexing them natively (who searches for polygons by equality, anyway). The videos above (especially the November one) show how to use simple polygons for point-in-polygon searches natively in Neo4j. If you only need them for point-in-polygon searches, that is fine. If you need the index to find the polygons, then you are back to the RTree index from the library.

But I do think you will not need to store the same geometry twice in any node, and you do not need the points in the same layer as the polygons.

cormander commented 5 years ago

Unfortunately, enough of the polygons aren't "simple" - they can enclave. So, your demonstrated implementation -- while very impressive, and I'll certainly end up using the routing portion of it -- (with credit to @craigtaverner of course) -- looks like I still need to RTREE the point layer for this.

So we're back to a new GeometryEncoder so we can just store the native points and not needing a string (lon:lat) and map (bbox) properties.

I appreciate the discussion! I'm not a Java developer but am happy to help where I can.

craigtaverner commented 5 years ago

OK. Sounds like we have two directions we can move in:

Create a new GeometryEncoder that understands the new native type and acts like a SimplePointLayer, but with the native type. This should be relatively easy.
Add support for complex polygons in the point-in-polygon library.

For the latter option, I've thought of one option to compromised between Neo4j's limitation of storing only Point[] and the need for multiple Point[] for MultiPolygons with holes, and that is to use multiple properties. The node becomes the geometry, and has properties like 'shell1', 'shell2', 'hole1', 'hole2', etc. and the point-in-polygon library can construct a complex polygon from this for the analysis.

cormander commented 5 years ago

Hey @craigtaverner -- does this issue get re-opened (option 1), or are we moving the discussion somewhere else (option 2)?

Appreciate the input so far. Very informative.

Thanks,

craigtaverner commented 5 years ago

We should probably continue in another channel. I suggest starting with direct slack messages on the neo4j-users.slack.com channel. We could leave this issue open or closed.

craigtaverner commented 5 years ago

I made a new release for Neo4j 3.4 at https://github.com/neo4j-contrib/spatial/releases/tag/0.26.1-neo4j-3.4.9 with support for native neo4j Point types.

cormander commented 5 years ago

You sir are amazing! Swapped out the encoder and removed the longitude/latitude properties -- everything is looking great so far.

My code change to reference this new release, for reference; https://github.com/OurVoiceUSA/HelloVoterAPI/commit/31ae06329220dd4015a45b3fd7f63f9686ddb7f7

craigtaverner commented 5 years ago

Glad to be of help!

I hope it works well for you. It had not crossed my mind to add native Point support to the spatial library because I assumed people would use native Points with the built-in spatial index in Neo4j. This is a novel approach, using native points with the old in-graph RTree index (or the alternative geohash or hilbert mappings onto explicit Lucene string indexes, which are also supported).