Alternate polygonToCells algorithm

nrabinowitz commented 11 months ago

This PR implements an alternate algorithm for polygonToCells, which provides polygonToCompactCells as a byproduct.

Goals

[x] Offer a stream/iterator-based algorithm to avoid memory issues when converting polygons to resolutions with high-cardinality output (#521, https://github.com/uber/h3-java/issues/21)
[x] Provide compact output without a separate compactCells step
[x] Improve performance of polygonToCells
[ ] Add new polyfill modes for contains and intersects (#775, #275, https://github.com/uber/h3-java/issues/41)

Algorithm Overview

let cell = base cell 0
while cell:
  res = getResolution(cell)
  if res == target res:
     if cell is in polygon:
        set output to cell and return
  if res < target res:
     let bbox = boundingBox(cell) // where bbox is padded to contain all descendants
     if bbox intersects polygon:
        if bbox is contained by polygon:
          set output to cell and return
        cell = first child of cell
        continue
  cell = next cell // where "next" is the next sibling, or next sibling of parent, etc
when cell is null we're done

Results

This PR includes an iterator-based algo with full test parity to the current algorithm. In its iterator form, it takes very little memory and that memory does not scale with the size of the output - so we can stream cells out of a very large polyfill process without worrying about memory issues.
- The new algo gives compact (or nearly compact) output by default, so we get polygonToCellsCompact effectively for free.
- The performance isn't exactly better or worse than the current algorithm - it just has different characteristics. In benchmarks, it's faster than the current algorithm for large numbers of cells (because we can use a small number of compact parent cells) but slower for large numbers of polygon vertices (because the containment check scales with the number of vertices). See benchmarks below.
- The new polyfill modes are not included in this PR, but should be simple to add based on the work done here (particularly cellBoundaryInsidePolygon and cellBoundaryCrossesGeoLoop).

For Discussion

Are the benefits sufficient to replace the existing polygonToCells algo, even if performance degrades in some cases?
In order to get the memory benefits, consumers need to use the iterator directly, or we need to implement some sort of batch mode that can fill chunks of memory provided by the caller (might be worthwhile to avoid costs in bindings associated with an FFI per output cell). Which option is better? What if any clean up do we need in the iterator interface if we want to make it part of the public API? Should we make the cellToChildren iterator public as well?

Benchmarks

In the following benchmakrks, SF is a 6-vertex polygon covering San Francisco, Alameda is a 49-vertex polygon covering a similar area, and SouthernExpansion is a 22-vertex polygon covering an area 2x-3x larger than SF.

The old polygonToCells performs better when the ratio of output cells to vertexes is low (few cells, complex polygons)
The new polygonToCells2 performs better when the ratio of output cells to vertexes is high (many cells, simple polygons)
Weirdly, the polygonToCellsCompact function, which is just polygonToCells2 without the uncompact step, performs about the same as polygonToCells2 🤔. I went over this several times, but it seems like the benchmark is correct?

Res 9

Built target benchmarkPolygon
    -- polygonToCellsSF: 2264.810000 microseconds per iteration (500 iterations)
    -- polygonToCellsSF2: 3530.106000 microseconds per iteration (500 iterations)
    -- polygonToCellsSFCompact: 3530.478000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlameda: 3182.120000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlameda2: 5960.740000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlamedaCompact: 6027.128000 microseconds per iteration (500 iterations)
    -- polygonToCellsSouthernExpansion: 103521.500000 microseconds per iteration (10 iterations)
    -- polygonToCellsSouthernExpansion2: 112462.900000 microseconds per iteration (10 iterations)
    -- polygonToCellsSouthernExpansionCompact: 104374.600000 microseconds per iteration (10 iterations)

Res 10

Built target benchmarkPolygon
    -- polygonToCellsSF: 12268.826000 microseconds per iteration (500 iterations)
    -- polygonToCellsSF2: 11435.998000 microseconds per iteration (500 iterations)
    -- polygonToCellsSFCompact: 11363.694000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlameda: 15037.306000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlameda2: 25583.798000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlamedaCompact: 25478.888000 microseconds per iteration (500 iterations)
    -- polygonToCellsSouthernExpansion: 693673.900000 microseconds per iteration (10 iterations)
    -- polygonToCellsSouthernExpansion2: 672981.900000 microseconds per iteration (10 iterations)
    -- polygonToCellsSouthernExpansionCompact: 659994.200000 microseconds per iteration (10 iterations)

Res 11

Built target benchmarkPolygon
    -- polygonToCellsSF: 81978.584000 microseconds per iteration (500 iterations)
    -- polygonToCellsSF2: 50297.930000 microseconds per iteration (500 iterations)
    -- polygonToCellsSFCompact: 50223.350000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlameda: 96910.940000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlameda2: 140856.702000 microseconds per iteration (500 iterations)
    -- polygonToCellsAlamedaCompact: 140873.162000 microseconds per iteration (500 iterations)
    -- polygonToCellsSouthernExpansion: 5245777.500000 microseconds per iteration (10 iterations)
    -- polygonToCellsSouthernExpansion2: 4613408.400000 microseconds per iteration (10 iterations)
    -- polygonToCellsSouthernExpansionCompact: 4607957.500000 microseconds per iteration (10 iterations)

nrabinowitz commented 11 months ago

With the various optimizations I've added so far, I've closed the gaps in the original benchmarks, and I think the new algorithm is likely to be faster most of the time, though still slower as the ratio of vertexes to cells increases (numbers shown below are ratio of old / new):

Res 9

Original:
  Simple: 0.64x
  Complex: 0.53x
  Large: 0.92x

Now:
  Simple: 1.00x
  Complex: 0.81x
  Large: 1.3x

Res 10

Original:
  Simple: 1.07x
  Complex: 0.58x
  Large: 1.03x

Now:
  Simple: 1.56x
  Complex: 0.83x
  Large: 1.36x

nrabinowitz commented 11 months ago

Some more benchmarking data - when transpiled into h3-js, the performance looks almost exactly the same, so close that for a while I was worried I wasn't testing correctly:

polygonToCells_all_countries_0 x 3.98 ops/sec ±9.40% (30 runs sampled)
polygonToCells2_all_countries_0 x 3.78 ops/sec ±11.69% (30 runs sampled)
polygonToCells_all_countries_1 x 0.74 ops/sec ±3.42% (21 runs sampled)
polygonToCells2_all_countries_1 x 0.75 ops/sec ±1.33% (21 runs sampled)
polygonToCells_all_countries_2 x 0.53 ops/sec ±2.04% (21 runs sampled)
polygonToCells2_all_countries_2 x 0.55 ops/sec ±1.08% (21 runs sampled)
polygonToCells_all_countries_3 x 0.20 ops/sec ±10.64% (20 runs sampled)
polygonToCells2_all_countries_3 x 0.20 ops/sec ±2.62% (20 runs sampled)
polygonToCells_all_countries_4 x 0.04 ops/sec ±4.17% (20 runs sampled)
polygonToCells2_all_countries_4 x 0.04 ops/sec ±3.42% (20 runs sampled)
polygonToCells_FRANCE_0 x 36.45 ops/sec ±140.62% (28 runs sampled)
polygonToCells2_FRANCE_0 x 30.89 ops/sec ±151.18% (30 runs sampled)
polygonToCells_FRANCE_1 x 26.86 ops/sec ±133.33% (74 runs sampled)
polygonToCells2_FRANCE_1 x 60.46 ops/sec ±61.84% (73 runs sampled)
polygonToCells_FRANCE_2 x 38.19 ops/sec ±78.95% (55 runs sampled)
polygonToCells2_FRANCE_2 x 73.25 ops/sec ±0.60% (56 runs sampled)
polygonToCells_FRANCE_3 x 21.73 ops/sec ±112.03% (43 runs sampled)
polygonToCells2_FRANCE_3 x 27.79 ops/sec ±82.78% (51 runs sampled)
polygonToCells_FRANCE_4 x 21.06 ops/sec ±1.18% (54 runs sampled)
polygonToCells2_FRANCE_4 x 22.21 ops/sec ±0.77% (56 runs sampled)

The all_countries suite uses multipolygons for the 258 countries in the Natural Earth 1:110m vector data, which I figured should offer a wide range of shapes and sizes. I wasn't able to benchmark past res 4, but performance looks almost identical for this dataset. Looking at an individual country like France highlights some differences (in this case, the new version performs slightly better), but overall it's a wash.

Given the memory benefits, I'm considering this a win, though I was hoping it would actually be faster 🤷.

nrabinowitz commented 11 months ago

Even more benchmarking here (now in C) suggests that the resolution of the output cells also makes a difference, which makes sense given the hierarchical search involved in the new algo. This means that the performance of the new algo decreases compared to the old algo as the res gets finer. The following benchmark is for Djibouti:

res 5
  -- polygonToCells_1: 15230.800000 microseconds per iteration (5 iterations)
  -- polygonToCells_2: 14093.200000 microseconds per iteration (5 iterations)
res 6
  -- polygonToCells_1: 46217.600000 microseconds per iteration (5 iterations)
  -- polygonToCells_2: 87146.400000 microseconds per iteration (5 iterations)
res 7
  -- polygonToCells_1: 222787.600000 microseconds per iteration (5 iterations)
  -- polygonToCells_2: 552836.200000 microseconds per iteration (5 iterations)
res 8
  -- polygonToCells_1: 1393634.200000 microseconds per iteration (5 iterations)
  -- polygonToCells_2: 3721000.000000 microseconds per iteration (5 iterations)
res 9
  -- polygonToCells_1:  8811737.600000 microseconds per iteration (5 iterations)
  -- polygonToCells_2: 28252274.800000 microseconds per iteration (5 iterations)

By res 9 it's 3x worse than the current implementation 😢

nrabinowitz commented 11 months ago

Further investigation shows that there's something else going on, rather than just resolution, but I'm not sure what.

The benchmark for all countries has the old algo better after res 5, but I can't run it for finer resolutions:

Res 0
    -- polygonToCellsAllCountries_1: 1846956.400000 microseconds per iteration (5 iterations)
    -- polygonToCellsAllCountries_2: 147822.200000 microseconds per iteration (5 iterations)
Res 1
    -- polygonToCellsAllCountries_1: 3200132.800000 microseconds per iteration (5 iterations)
    -- polygonToCellsAllCountries_2: 1224828.200000 microseconds per iteration (5 iterations)
Res 2
    -- polygonToCellsAllCountries_1: 2814667.400000 microseconds per iteration (5 iterations)
    -- polygonToCellsAllCountries_2: 1650428.200000 microseconds per iteration (5 iterations)
Res 3
    -- polygonToCellsAllCountries_1: 5786302.800000 microseconds per iteration (5 iterations)
    -- polygonToCellsAllCountries_2: 3390271.200000 microseconds per iteration (5 iterations)
Res 4
    -- polygonToCellsAllCountries_1: 14079108.200000 microseconds per iteration (5 iterations)
    -- polygonToCellsAllCountries_2: 13361490.400000 microseconds per iteration (5 iterations)
Res 5
    -- polygonToCellsAllCountries_1: 63288210.400000 microseconds per iteration (5 iterations)
    -- polygonToCellsAllCountries_2: 73714663.800000 microseconds per iteration (5 iterations)

Testing Indonesia shows the same pattern, old algo is almost twice as fast by res 7. But testing a smaller set of small Indonesian islands at finer resolutions shows the opposite pattern from res 10 onward:

Res 8
    -- polygonToCellsIndonesia_1: 7767.200000 microseconds per iteration (5 iterations)
    -- polygonToCellsIndonesia_2: 10843.000000 microseconds per iteration (5 iterations)
Res 9
    -- polygonToCellsIndonesia_1: 22478.400000 microseconds per iteration (5 iterations)
    -- polygonToCellsIndonesia_2: 25132.200000 microseconds per iteration (5 iterations)
Res 10
    -- polygonToCellsIndonesia_1: 109577.000000 microseconds per iteration (5 iterations)
    -- polygonToCellsIndonesia_2: 94577.200000 microseconds per iteration (5 iterations)
Res 11
    -- polygonToCellsIndonesia_1: 754375.400000 microseconds per iteration (5 iterations)
    -- polygonToCellsIndonesia_2: 472660.000000 microseconds per iteration (5 iterations)
Res 12
    -- polygonToCellsIndonesia_1: 4869990.800000 microseconds per iteration (5 iterations)
    -- polygonToCellsIndonesia_2: 2903927.600000 microseconds per iteration (5 iterations)

So again, it may be a wash 🤷‍♂️. As far as I can tell, simpler shapes with better compaction yield faster times for the new algo, complex shapes and poor compaction are faster with the old.

isaacbrodsky commented 10 months ago

I put together a little animation of this process (code: https://github.com/isaacbrodsky/h3/tree/updated-polyfill-animation), to aid in visualizing what it's doing. I intentionally cropped out the first and last iterations were it's going over the base cells because the scale is so different compared to where most of the work happens.

30fps (takes a little more than a minute I think): sf_polyfill

15fps: sf_polyfill_15fps

nrabinowitz commented 10 months ago

I put together a little animation of this process (code: https://github.com/isaacbrodsky/h3/tree/updated-polyfill-animation), to aid in visualizing what it's doing. I intentionally cropped out the first and last iterations were it's going over the base cells because the scale is so different compared to where most of the work happens.

These are great! I was thinking about making an Observable notebook for this, but it's a little tough without reimplementing the algo, or just animating a log of the output

nrabinowitz commented 10 months ago

On my end, I did some more benchmarking, using all countries with simplified boundaries. The new algo is overall slower for the majority (81%) of the polygons:

This shows that the performance difference tends to be significantly worse as the number of cells decreases, with the worst comparisons happening when no cells at all are found within the polygon. I didn't find a correlation between false positives in the bbox checks and bad performance, though I may be looking at it wrong. I have at least one more optimization to try, which would increase false positives but avoid the most expensive parts of cellToBBox.

nrabinowitz commented 10 months ago

Latest commit speeds up the bbox calculation, gaining maybe 20% perf, but still slower for 75% of countries in the benchmark :(.

nrabinowitz commented 10 months ago

Added a lookup table for the res 0 cell bounding boxes, since we check all of them every time, and this seems to have done it! Now in my benchmark (all countries, res 1-5) the new polyfill algo is faster 80% of the time! The main issue here is that the pre-calculated res 0 bboxes seem to wipe out the advantage of the old algorithm for smaller, low-vertex shapes:

coveralls commented 10 months ago

coverage: 98.784% (+0.1%) from 98.657% when pulling 6f0a122ecad7aa3a58eeaa426ffedac646e2b7a6 on nrabinowitz:updated-polyfill into 5aeff3dbb2f5d5769dbb3aab7feb6b2aaaf02a0a on uber:master.

nrabinowitz commented 10 months ago

One additional note here: My new approach to NORMALIZE_LNG seems to result in a lot of cases where we lose branch coverage (and possibly no branch coverage is possible, e.g. if we set the normalization argument explicitly we'll never hit most branches). Open to better options here - I'm also not thrilled that my current implementation hits the most common case, NORMALIZE_NONE, last. In other languages I'd probably pass around a function here, but that's harder and more costly in C. Let me know if you have a better option.

nrabinowitz commented 10 months ago

Thanks everyone for the reviews and support on this!

uber / h3