locationtech / jts

The JTS Topology Suite is a Java library for creating and manipulating vector geometry.
Other
1.94k stars 442 forks source link

Improve performance of HPRtree #1012

Closed msbarry closed 8 months ago

msbarry commented 10 months ago

While profiling performance of planetiler I noticed that STRtree in MCIndexNoder was one of the bottlenecks. I looked into HPRtree and saw there were a few opportunities to improve performance using some ideas from the flatbush library:

This version performs faster in some of the microbenchmarks (especially when result sets are large) but when I swap out STRtree with this improved HPRtree planetiler spends about 30% less time in MCIndexNoder.computeNodes, isValid checks, and about 10% less time in the most expensive operations planetiler uses: GeometryPrecisionReducer.reduce and bufferUnionUnbuffer.

dr-jts commented 10 months ago

The code is certainly nicer with generics. Do you think the compile warnings are going to be obnoxious for downstream users?

Generics have been asked for for a while for index containers, so maybe now is the time to add them.

msbarry commented 10 months ago

I don't think they would be too bad for consumers if it means they can get rid of instanceof checks and casting and don't cause any compile failures.

I could go and update the other spatial index classes too. Think I should do that in this PR or a separate one? It might touch a lot of files.

dr-jts commented 10 months ago

I could go and update the other spatial index classes too. Think I should do that in this PR or a separate one? It might touch a lot of files.

No, that should be done in a separate PR. Actually I'm still wondering if SpatialIndex generics are too much for this PR. I hate to lose all the work you've done - but perhaps you can move it to a different PR?

msbarry commented 10 months ago

No worries. Sounds good to keep this one focused on the performance improvements. I'll move generics out into another pr

msbarry commented 10 months ago

For the generics PR, there are a bunch of things that it could include but it starts to get big pretty fast:

parameterizing the external API of index classes

parameterizing the internal implementation of index classes

updating places in the code that use these indexes to benefit from the parameterized versions:

What do you think the scope of an initial PR should be?

dr-jts commented 10 months ago

That's a great list, @msbarry.

What do you think the scope of an initial PR should be?

Ideally - all of it! Or, at least SpatialIndex implementors and usage.

dr-jts commented 9 months ago

Sorry, @msbarry , I've been delayed on getting on to merging this PR.

Why does switching to using two arrays instead of ArrayList<Item> improve performance?

msbarry commented 9 months ago

@dr-jts thanks for getting back, I think it's because ArrayList<Item> does more memory reads scattered across the heap when processing the items in one node: item -> envelope then all of the doubles in the envelope are nearby, and item -> data for any match.

The biggest gain comes from double[] itemBounds because it can do one hop to find the doubles to check against, and they are nearby in ram for a given node.

Moving to Object[] itemValues is a smaller improvement since it only saves one memory hop. It makes HPRTree run roughly 10x faster than flatbush js library on their posted benchmarks but that's probably because the benchmarks match a lot of items and don't do anything with them so the JVM doesn't even need to dereference the item value. In a more real-world scenario where you're looking for a needle in a haystack and then doing something with I'd imagine the difference would be smaller.

That being said JVM performance often defies my intuition and this represents the fastest I could get the benchmarks and planetiler code that uses JTS a lot to run through mostly trial and error 😄

dr-jts commented 9 months ago

Moving to Object[] itemValues is a smaller improvement since it only saves one memory hop. It makes HPRTree run roughly 10x faster than flatbush js library on their posted benchmarks

@msbarry did you port the Flatbush benchmark code? If so it would be nice to include that in this PR. Performance tests go in this package, and can use the performance test harness framework via PerformanceTestCase.

msbarry commented 9 months ago

@msbarry did you port the Flatbush benchmark code? If so it would be nice to include that in this PR. Performance tests go in this package, and can use the performance test harness framework via PerformanceTestCase.

Sure thing - I ported the tests over and have them run against STRTree and HPRtree. Here are the results from my 2021 m1 macbook pro in comparison to flatbush on node.js 20.3.0:

  flatbush js strtree hprtree before hprtree after
build 1m rectangles 180ms 1001ms 1103ms 241ms
query 0.01% 4.5ms 14ms 11ms 4ms
query 1% 37ms 165ms 82ms 12ms
query 10% 295ms 1227ms 510ms 61ms
Raw hprtree/strtree benchmark output
master d26cef981decaf12f10e4971106f2525ee9a66ce ``` HPRTree Build time = 1103 ms STRTree Build time = 1017 ms ----- Query size: 1 HPRTree query result items = 101786 runQueriesHPR : 11 ms STRTree query result items = 101786 runQueriesSTR : 15 ms ----- Query size: 10 HPRTree query result items = 3046824 runQueriesHPR : 82 ms STRTree query result items = 3046824 runQueriesSTR : 173 ms ----- Query size: 31 HPRTree query result items = 25770594 runQueriesHPR : 510 ms STRTree query result items = 25770594 runQueriesSTR : 1247 ms ``` this branch 0001c1634b35845064cd22c2b350e4259c36f327 ``` HPRTree Build time = 241 ms STRTree Build time = 1001 ms ----- Query size: 1 HPRTree query result items = 101786 runQueriesHPR : 4 ms STRTree query result items = 101786 runQueriesSTR : 14 ms ----- Query size: 10 HPRTree query result items = 3046824 runQueriesHPR : 12 ms STRTree query result items = 3046824 runQueriesSTR : 165 ms ----- Query size: 31 HPRTree query result items = 25770594 runQueriesHPR : 61 ms STRTree query result items = 25770594 runQueriesSTR : 1227 ms ```
Raw flatbush benchmark output ```diff ❯ node bench 1000000 rectangles node size: 16 + flatbush: 178.922ms index size: 38,400,092 + 1000 searches 10%: 294.68ms + 1000 searches 1%: 37.187ms + 1000 searches 0.01%: 4.476ms 1000 searches of 100 neighbors: 18.683ms 1 searches of 1000000 neighbors: 111.913ms 100000 searches of 1 neighbors: 482.469ms rbush: 843.683ms 1000 searches 10%: 640.872ms 1000 searches 1%: 155.409ms 1000 searches 0.01%: 17.523ms 1000 searches of 100 neighbors: 47.962ms 1 searches of 1000000 neighbors: 271.478ms 100000 searches of 1 neighbors: 1.212s ```

Since Flatbush builds in ~180ms but HPRtree takes ~250ms there may be more room for improvement when building... It looks like of those 250ms, it's:

msbarry commented 8 months ago

Hello, just checking in here, are there any other tests to run or changes to make before merging this?

dr-jts commented 8 months ago

Hello, just checking in here, are there any other tests to run or changes to make before merging this?

No, I think this looks good. I'll merge soon.