Closed nrabinowitz closed 10 months ago
With the various optimizations I've added so far, I've closed the gaps in the original benchmarks, and I think the new algorithm is likely to be faster most of the time, though still slower as the ratio of vertexes to cells increases (numbers shown below are ratio of old / new):
Res 9
Original:
Simple: 0.64x
Complex: 0.53x
Large: 0.92x
Now:
Simple: 1.00x
Complex: 0.81x
Large: 1.3x
Res 10
Original:
Simple: 1.07x
Complex: 0.58x
Large: 1.03x
Now:
Simple: 1.56x
Complex: 0.83x
Large: 1.36x
Some more benchmarking data - when transpiled into h3-js
, the performance looks almost exactly the same, so close that for a while I was worried I wasn't testing correctly:
polygonToCells_all_countries_0 x 3.98 ops/sec ±9.40% (30 runs sampled)
polygonToCells2_all_countries_0 x 3.78 ops/sec ±11.69% (30 runs sampled)
polygonToCells_all_countries_1 x 0.74 ops/sec ±3.42% (21 runs sampled)
polygonToCells2_all_countries_1 x 0.75 ops/sec ±1.33% (21 runs sampled)
polygonToCells_all_countries_2 x 0.53 ops/sec ±2.04% (21 runs sampled)
polygonToCells2_all_countries_2 x 0.55 ops/sec ±1.08% (21 runs sampled)
polygonToCells_all_countries_3 x 0.20 ops/sec ±10.64% (20 runs sampled)
polygonToCells2_all_countries_3 x 0.20 ops/sec ±2.62% (20 runs sampled)
polygonToCells_all_countries_4 x 0.04 ops/sec ±4.17% (20 runs sampled)
polygonToCells2_all_countries_4 x 0.04 ops/sec ±3.42% (20 runs sampled)
polygonToCells_FRANCE_0 x 36.45 ops/sec ±140.62% (28 runs sampled)
polygonToCells2_FRANCE_0 x 30.89 ops/sec ±151.18% (30 runs sampled)
polygonToCells_FRANCE_1 x 26.86 ops/sec ±133.33% (74 runs sampled)
polygonToCells2_FRANCE_1 x 60.46 ops/sec ±61.84% (73 runs sampled)
polygonToCells_FRANCE_2 x 38.19 ops/sec ±78.95% (55 runs sampled)
polygonToCells2_FRANCE_2 x 73.25 ops/sec ±0.60% (56 runs sampled)
polygonToCells_FRANCE_3 x 21.73 ops/sec ±112.03% (43 runs sampled)
polygonToCells2_FRANCE_3 x 27.79 ops/sec ±82.78% (51 runs sampled)
polygonToCells_FRANCE_4 x 21.06 ops/sec ±1.18% (54 runs sampled)
polygonToCells2_FRANCE_4 x 22.21 ops/sec ±0.77% (56 runs sampled)
The all_countries
suite uses multipolygons for the 258 countries in the Natural Earth 1:110m vector data, which I figured should offer a wide range of shapes and sizes. I wasn't able to benchmark past res 4, but performance looks almost identical for this dataset. Looking at an individual country like France highlights some differences (in this case, the new version performs slightly better), but overall it's a wash.
Given the memory benefits, I'm considering this a win, though I was hoping it would actually be faster 🤷.
Even more benchmarking here (now in C) suggests that the resolution of the output cells also makes a difference, which makes sense given the hierarchical search involved in the new algo. This means that the performance of the new algo decreases compared to the old algo as the res gets finer. The following benchmark is for Djibouti:
res 5
-- polygonToCells_1: 15230.800000 microseconds per iteration (5 iterations)
-- polygonToCells_2: 14093.200000 microseconds per iteration (5 iterations)
res 6
-- polygonToCells_1: 46217.600000 microseconds per iteration (5 iterations)
-- polygonToCells_2: 87146.400000 microseconds per iteration (5 iterations)
res 7
-- polygonToCells_1: 222787.600000 microseconds per iteration (5 iterations)
-- polygonToCells_2: 552836.200000 microseconds per iteration (5 iterations)
res 8
-- polygonToCells_1: 1393634.200000 microseconds per iteration (5 iterations)
-- polygonToCells_2: 3721000.000000 microseconds per iteration (5 iterations)
res 9
-- polygonToCells_1: 8811737.600000 microseconds per iteration (5 iterations)
-- polygonToCells_2: 28252274.800000 microseconds per iteration (5 iterations)
By res 9 it's 3x worse than the current implementation 😢
Further investigation shows that there's something else going on, rather than just resolution, but I'm not sure what.
The benchmark for all countries has the old algo better after res 5, but I can't run it for finer resolutions:
Res 0
-- polygonToCellsAllCountries_1: 1846956.400000 microseconds per iteration (5 iterations)
-- polygonToCellsAllCountries_2: 147822.200000 microseconds per iteration (5 iterations)
Res 1
-- polygonToCellsAllCountries_1: 3200132.800000 microseconds per iteration (5 iterations)
-- polygonToCellsAllCountries_2: 1224828.200000 microseconds per iteration (5 iterations)
Res 2
-- polygonToCellsAllCountries_1: 2814667.400000 microseconds per iteration (5 iterations)
-- polygonToCellsAllCountries_2: 1650428.200000 microseconds per iteration (5 iterations)
Res 3
-- polygonToCellsAllCountries_1: 5786302.800000 microseconds per iteration (5 iterations)
-- polygonToCellsAllCountries_2: 3390271.200000 microseconds per iteration (5 iterations)
Res 4
-- polygonToCellsAllCountries_1: 14079108.200000 microseconds per iteration (5 iterations)
-- polygonToCellsAllCountries_2: 13361490.400000 microseconds per iteration (5 iterations)
Res 5
-- polygonToCellsAllCountries_1: 63288210.400000 microseconds per iteration (5 iterations)
-- polygonToCellsAllCountries_2: 73714663.800000 microseconds per iteration (5 iterations)
Testing Indonesia shows the same pattern, old algo is almost twice as fast by res 7. But testing a smaller set of small Indonesian islands at finer resolutions shows the opposite pattern from res 10 onward:
Res 8
-- polygonToCellsIndonesia_1: 7767.200000 microseconds per iteration (5 iterations)
-- polygonToCellsIndonesia_2: 10843.000000 microseconds per iteration (5 iterations)
Res 9
-- polygonToCellsIndonesia_1: 22478.400000 microseconds per iteration (5 iterations)
-- polygonToCellsIndonesia_2: 25132.200000 microseconds per iteration (5 iterations)
Res 10
-- polygonToCellsIndonesia_1: 109577.000000 microseconds per iteration (5 iterations)
-- polygonToCellsIndonesia_2: 94577.200000 microseconds per iteration (5 iterations)
Res 11
-- polygonToCellsIndonesia_1: 754375.400000 microseconds per iteration (5 iterations)
-- polygonToCellsIndonesia_2: 472660.000000 microseconds per iteration (5 iterations)
Res 12
-- polygonToCellsIndonesia_1: 4869990.800000 microseconds per iteration (5 iterations)
-- polygonToCellsIndonesia_2: 2903927.600000 microseconds per iteration (5 iterations)
So again, it may be a wash 🤷♂️. As far as I can tell, simpler shapes with better compaction yield faster times for the new algo, complex shapes and poor compaction are faster with the old.
I put together a little animation of this process (code: https://github.com/isaacbrodsky/h3/tree/updated-polyfill-animation), to aid in visualizing what it's doing. I intentionally cropped out the first and last iterations were it's going over the base cells because the scale is so different compared to where most of the work happens.
30fps (takes a little more than a minute I think):
15fps:
I put together a little animation of this process (code: https://github.com/isaacbrodsky/h3/tree/updated-polyfill-animation), to aid in visualizing what it's doing. I intentionally cropped out the first and last iterations were it's going over the base cells because the scale is so different compared to where most of the work happens.
These are great! I was thinking about making an Observable notebook for this, but it's a little tough without reimplementing the algo, or just animating a log of the output
On my end, I did some more benchmarking, using all countries with simplified boundaries. The new algo is overall slower for the majority (81%) of the polygons:
This shows that the performance difference tends to be significantly worse as the number of cells decreases, with the worst comparisons happening when no cells at all are found within the polygon. I didn't find a correlation between false positives in the bbox checks and bad performance, though I may be looking at it wrong. I have at least one more optimization to try, which would increase false positives but avoid the most expensive parts of cellToBBox
.
Latest commit speeds up the bbox calculation, gaining maybe 20% perf, but still slower for 75% of countries in the benchmark :(.
Added a lookup table for the res 0 cell bounding boxes, since we check all of them every time, and this seems to have done it! Now in my benchmark (all countries, res 1-5) the new polyfill algo is faster 80% of the time! The main issue here is that the pre-calculated res 0 bboxes seem to wipe out the advantage of the old algorithm for smaller, low-vertex shapes:
One additional note here: My new approach to NORMALIZE_LNG
seems to result in a lot of cases where we lose branch coverage (and possibly no branch coverage is possible, e.g. if we set the normalization
argument explicitly we'll never hit most branches). Open to better options here - I'm also not thrilled that my current implementation hits the most common case, NORMALIZE_NONE
, last. In other languages I'd probably pass around a function here, but that's harder and more costly in C. Let me know if you have a better option.
Thanks everyone for the reviews and support on this!
This PR implements an alternate algorithm for
polygonToCells
, which providespolygonToCompactCells
as a byproduct.Goals
compactCells
steppolygonToCells
contains
andintersects
(#775, #275, https://github.com/uber/h3-java/issues/41)Algorithm Overview
Results
polygonToCellsCompact
effectively for free.cellBoundaryInsidePolygon
andcellBoundaryCrossesGeoLoop
).For Discussion
polygonToCells
algo, even if performance degrades in some cases?cellToChildren
iterator public as well?Benchmarks
In the following benchmakrks,
SF
is a 6-vertex polygon covering San Francisco,Alameda
is a 49-vertex polygon covering a similar area, andSouthernExpansion
is a 22-vertex polygon covering an area 2x-3x larger than SF.polygonToCells
performs better when the ratio of output cells to vertexes is low (few cells, complex polygons)polygonToCells2
performs better when the ratio of output cells to vertexes is high (many cells, simple polygons)polygonToCellsCompact
function, which is justpolygonToCells2
without the uncompact step, performs about the same aspolygonToCells2
🤔. I went over this several times, but it seems like the benchmark is correct?Res 9
Res 10
Res 11