uber / h3

Hexagonal hierarchical geospatial indexing system
https://h3geo.org
Apache License 2.0
4.95k stars 470 forks source link

Performance characteristics of index init #168

Open benstadin opened 5 years ago

benstadin commented 5 years ago

I'm thinking about using H3 as a storage for vector map cells for map rendering. A cell would contain geometry data for a given zoom level.

What are the runtime characteristics of H3 for this scenario?

For an example, consider a web map which needs to calculate which cells it would need to request from a server for the current view. Would it need to initialize an index in order to query for the cells for the current view bbox and given resolution? If so, what is the expected memory footprint of the index?

Or are the cell indices calculated mathematically on demand?

nrabinowitz commented 5 years ago

I'm not sure I entirely understand the use case, but let me try to respond:

To be honest, because of the mismatch between the H3 grid and the standard x,y,z tile-loading scheme, I'm not certain H3 is a good fit for a web map vector tile server - you'd need to re-engineer a lot of pieces that assume the standard tile pyramid.

benstadin commented 5 years ago

As background to the question: I've a custom 3D rendering engine for indoor / outdoor maps. I'm currently using a variation of the MODIS projection to create cell indices, the algorithm is adapted to have 1..20 zoom levels so that the cell lengths of the adapted MODIS cells are similar to the web mercator tile lengths (on Equator). The cell loading for showing the currently visible portion of the map involves dealing with related cells, that is a cell containing a feature which overlaps a cell currently in view. This is cell meta info (about related cells containing covering geometries for a cell) is done in a preprocessing step when the cells are generated.

The rationale behind this is to have a projection independent storage for vector map features (we usually use a fitting UTM projection for the indoor maps).

That works quite well, but I see potential for new use cases replacing this custom index with H3. On top of my head I would need to make sure H3 allows the following:

Will the polyfill function be fast enough already for the small number of expected cells per query and zoom level?

And can the zoom levels be adjusted in size in order to (roughly) fit tile area sizes of common tile pyramid zoom levels?

[1] https://wiki.openstreetmap.org/wiki/Zoom_levels

nrabinowitz commented 5 years ago

Adapt H3 zoom levels to roughly match web mercator zoom levels. This would be like H3 area at zoom level x == sqr(Web Meractor Tile Pyramid meter / pixel at zoom level x * 256) [1]

H3 resolutions are fixed - you can see area per cell in each resolution here, and also access it programmatically via the hexAreaKm2 and hexAreaM2 functions. The premise of the question is a little tricky to begin with, since Web Mercator tiles may be very different areas depending where they are on the earth, whereas H3 cells are equal area everywhere (within some small margin of error).

Ensure lookup is fast for all zoom levels and memory footprint is minimal, so that H3 can be used on client side for fast computation of required cells for the current view. The current view will not contain more than a dozen or two cells at once for a current map view.

I've used H3 in this way in the client, and it works fairly well up to approximately 10k hexagons per viewport (not considering rendering performance). There are currently some issues at very coarse resolutions (0 and 1) and calculation is slower if it encounters a pentagon, but if you're primarily concerned about finer resolutions this should work well.

benstadin commented 5 years ago

Thank you, I‘ll investigate about the H3 cell resolutions.

About Web Mercator tiles vs H3 vs HSG (adapted MODIS): Your are of course correct with Web Mercator tiles not being equal area. That‘s why I take the tile length at equator as a reference (where the projection is accurate) for the HSG cells. HSG/MODIS is also equal area, though the world would look like an onion in this projection. I only care about those cells for creating an evenly distributed (in terms of data size, not projection) spatial index, where assuming a fixed simplification ratio per zoom level is totally fine (e.g. it does not matter whether simplifying the geometries of Greenland or Australia when the simplification tolerance is properly chosen per zoom level).

sahrk commented 5 years ago

The premise of the question is a little tricky to begin with, since Web Mercator tiles may be very different areas depending where they are on the earth, whereas H3 cells are equal area everywhere (within some small margin of error).

@nrabinowitz I don't think you can characterize the H3 cells as equal area. The variation in area on an icosahedral gnomonic is almost 60% (see reference below). Though you are correct that when the resolution gets high enough the magnitude of the variation will be small, since the cell areas will be small. Maybe it would be worth generating a table of the range of areas by resolution?

reference: 1998. White D., A.J. Kimerling, K. Sahr, and L Song. Comparing area and shape distortion on polyhedral-based recursive partitions of the sphere. International Journal of Geographical Information Science, 12(8):805-827.

nrabinowitz commented 5 years ago

Thanks for the clarification @sahrk

Maybe it would be worth generating a table of the range of areas by resolution?

Agreed - I think @isaacbrodsky had a script to calculate this empirically. I also think it would be a good idea to provide a page in the documentation discussing the basic properties of the grid (shape/area distortion, etc), as these questions come up with some frequency.

isaacbrodsky commented 5 years ago

I've posted the code I used to sample the area of H3 cells here: https://github.com/isaacbrodsky/h3measurements/blob/master/README-SummaryAreaStats.md It's not very nicely formatted right now but does include the max, min, etc for each resolution.

billyauhk commented 5 years ago

Wanted to add my two-cents into the discussion. I have exhaustively listed H3 cells from resolution 1 to resolution 6 before I reached this issue...I would like to share the numbers that I get such that somebody could cross-validate them:

Column 1 -- Resolution Column 2 -- Average Area as on the Resolution Table Column 3 -- Average Area (using GeographicLib + WGS84 ellipsoid) Column 4 -- Area Ratio between Largest and Smallest Cell

1 607220.9782429 605778.65 2.2489
2  86745.8540347  86716.36 2.3573
3  12392.2648621  12391.66 2.3971
4   1770.3235517   1770.31 2.4123
5   252.9033645     252.90 2.4180
6    36.1290521      36.13 2.4202

For the distribution, the shape is something like this: image

sahrk commented 5 years ago

@billyauhk did you include the pentagons in your calculations? Note that the table in the H3 documentation gives the average area for hexagons only.

Maybe the table in the documentation should also list the area of a pentagon at each resolution (all 12 pentagons should have exactly the same area).

billyauhk commented 5 years ago

@sahrk I included all the valid indexes, so at resolution 4 there are 12 pentagons. For odd-numbered resolutions there are more non-hexagons (pentagon, heptagon, octagon, decagon), but the distribution of areas looks similar.

My guess is that the distribution relates to the fact that one project cells on the faces icodecahedron onto the sphere. So in small regions(say within a city) the cell sizes should be similar, but globally it has a larger variation.

Existence of non-hexagons other than pentagon is comfirmed at #208 and it seems somebody decided to work on writing more explicit documentations on these irregularities (see #209 ).

sahrk commented 5 years ago

@billyauhk When we refer to hexagons and pentagons, we are talking about the topology of the cells: all cells at all resolutions have either 5 or 6 neighbors, which makes them topological pentagons or hexagons respectively. But geometrically their edges are only straight lines on the plane. The additional points introduced along the cell edges only serve to precisely define the cell shape on the sphere. They don't alter the cell topology.

billyauhk commented 5 years ago

@sahrk So for those cell with more than 6 edges, there might be several edges connecting to the same neighbour cell? Okay...(I will have to verify that later on)

That's a big mix-up of words, as polygons (words like "triangle", "square", "hexagon", ...) should be describing the shape, but not (graph) degree. Maybe another TODO for document clean-up.

sahrk commented 5 years ago

@billyauhk This is standard terminology in global grid literature. An edge is the boundary between two cells. The additional points merely densify that edge in a particular space, they don't introduce new edges.

billyauhk commented 5 years ago

@sahrk More results after my message on Mar 22, on whether pentagon is included/excluded.

Probably due to the broken symmetry, maximum area always occurs at base cell 2 or 119, while minimum area always occurs at base cell 58 or 63 (a pentagon, as expected). I speculated that the minimum-sized hexagon will always be around the minimum-sized pentagon. The following gives a list of maximum and minimum area I've found, first column including the pentagon (the Column 4 in my Mar 22 message), and the second column exclude pentagons.

Resolution Max/Min Area Ratio Max/Min Hexagon Area Ratio
0 1.9665120 1.2270450
1 2.2489277 1.6499012
2 2.3572825 1.8651765
3 2.3971020 1.9596137
4 2.4123342 1.9952544
5 2.4179947 2.0094492
6 2.4201663 2.0146416
7 2.4209720 2.0166818
8 2.4212818 2.0174254

The numbers are observably moving towards a limit exponentially but I don't know the magic number. My wild-guess is, still, the magic number is related to the gnomonic projection.

mlyons-tcc commented 5 years ago

I computed area using LAEA projections centered at each hexagon using the same random sampling technique as @billyauhk and discarding pentagons. Below are my results where n=100,000.

Resolution Min Area (sq m) Max Area (sq m) Mean Area (sq m) Median Area (sq m) H3 Mean Reported Area (sq m) Max/Min Ratio Max/Mean Ratio Max/Median Ratio Max/H3 Mean Ratio
0 4.068E+12 4.987E+12 4.360E+12 4.120E+12 4.251E+12 1.226 1.144 1.210 1.173
1 445.469E+9 734.712E+9 618.909E+9 644.272E+9 607.221E+9 1.649 1.187 1.140 1.210
2 56.529E+9 105.431E+9 88.346E+9 89.427E+9 86.746E+9 1.865 1.193 1.179 1.215
3 7.691E+9 15.071E+9 12.620E+9 12.822E+9 12.392E+9 1.960 1.194 1.175 1.216
4 1.079E+9 2.153E+9 1.803E+9 1.832E+9 1.770E+9 1.995 1.194 1.176 1.216
5 153.082E+6 307.602E+6 257.539E+6 261.661E+6 252.903E+6 2.009 1.194 1.176 1.216
6 21.887E+6 43.943E+6 36.791E+6 37.377E+6 36.129E+6 2.008 1.194 1.176 1.216
7 3.126E+6 6.278E+6 5.256E+6 5.340E+6 5.161E+6 2.008 1.194 1.176 1.216
8 446.639E+3 896.797E+3 750.840E+3 762.816E+3 737.328E+3 2.008 1.194 1.176 1.216
9 63.801E+3 128.114E+3 107.263E+3 108.973E+3 105.333E+3 2.008 1.194 1.176 1.216
10 9.114E+3 18.302E+3 15.323E+3 15.568E+3 15.048E+3 2.008 1.194 1.176 1.216
11 1.302E+3 2.615E+3 2.189E+3 2.224E+3 2.150E+3 2.008 1.194 1.176 1.216
12 186.005E+0 373.509E+0 312.720E+0 317.703E+0 307.100E+0 2.008 1.194 1.176 1.216
13 26.572E+0 53.358E+0 44.674E+0 45.386E+0 43.900E+0 2.008 1.194 1.176 1.215
14 3.796E+0 7.623E+0 6.382E+0 6.484E+0 6.300E+0 2.008 1.194 1.176 1.210
15 542.288E-3 1.089E+0 911.720E-3 926.246E-3 900.000E-3 2.008 1.194 1.176 1.210
Fil commented 4 years ago

You might be interested by this other type of grid: https://observablehq.com/@fil/gray-fuller-grid

The performance in terms of speed and area distortion is quite good (in my biased opinion), see https://observablehq.com/d/93922be7718f42bf