onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.2k stars 100 forks source link

Improvements to geoparquet geoarrow conversion #933

Closed msbarry closed 5 days ago

msbarry commented 5 days ago

Geoparquet geoarrow conversion had previously been building a map with x/y/z/m entries for every coordinate read from the parquet file, then converting those to JTS coodinates when the geometry is requested. This change eliminates that wasteful conversion and has the parquet reader deserialize x/y/z/m values directly into a JTS CoordinateSequence.

This also changes the reader to fail fast when parquet schema doesn't match what's expected for the primary geometry type.

sonarcloud[bot] commented 5 days ago

Quality Gate Passed Quality Gate passed

Issues
7 New issues
0 Accepted issues

Measures
0 Security Hotspots
75.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions[bot] commented 5 days ago
This Branch 6fde8d8ecc8f1ca5ffbd73592543d51c4095cbf1 Base 024e387407c2d4de60603dc6007f1d0ee1eedc4e
``` 0:01:10 DEB [archive] - Tile stats: 0:01:10 DEB [archive] - Biggest tiles (gzipped) 1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k) 2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k) 3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k) 4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k) 5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k) 6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k) 7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k) 8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k) 9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k) 10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k) 0:01:10 DEB [archive] - Max tile sizes z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all boundary 154 374 443 583 938 339 433 548 773 1.6k 2.1k 7.2k 6.4k 5.8k 4.5k 7.2k water 7.7k 3.7k 8.6k 5.5k 2.6k 5.1k 15k 18k 16k 25k 15k 13k 17k 15k 12k 25k place 0 0 441 441 441 639 712 1k 1.5k 3.1k 5.6k 3.3k 1.7k 795 936 5.6k landuse 0 0 0 0 548 694 1.6k 6.8k 17k 44k 59k 50k 38k 19k 12k 59k transportation 0 0 0 0 243 782 1.2k 5.9k 8k 24k 17k 19k 65k 48k 34k 65k waterway 0 0 0 0 111 118 0 0 0 3.1k 2.4k 2.1k 2.1k 4.9k 2.4k 4.9k park 0 0 0 0 0 0 1.2k 4k 9.7k 19k 13k 8.2k 4.3k 3.4k 4.4k 19k transportation_name 0 0 0 0 0 0 369 464 1.2k 1.8k 5.4k 4.6k 3.9k 3.4k 18k 18k landcover 0 0 0 0 0 0 0 9.5k 29k 85k 72k 81k 53k 30k 24k 85k mountain_peak 0 0 0 0 0 0 0 1.1k 1.8k 3.4k 4.3k 2.8k 1.4k 1.4k 869 4.3k water_name 0 0 0 0 0 0 0 0 0 486 461 433 452 1.2k 1.5k 1.5k aerodrome_label 0 0 0 0 0 0 0 0 0 0 664 327 273 220 220 664 aeroway 0 0 0 0 0 0 0 0 0 0 1.6k 2.1k 3k 3.4k 2.7k 3.4k poi 0 0 0 0 0 0 0 0 0 0 0 0 501 498 83k 83k building 0 0 0 0 0 0 0 0 0 0 0 0 0 59k 92k 92k housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35k 35k full tile 7.9k 4k 9.5k 6.5k 3.7k 6k 20k 42k 85k 203k 185k 135k 114k 128k 244k 244k gzipped 6.2k 3.5k 7.1k 5.2k 3.1k 4.8k 14k 29k 60k 149k 138k 98k 83k 92k 154k 154k 0:01:10 DEB [archive] - Max tile: 244k (gzipped: 154k) 0:01:10 DEB [archive] - Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic 0:01:10 DEB [archive] - # tiles: 4,115,036 0:01:10 DEB [archive] - # features: 5,487,099 0:01:10 INF [archive] - Finished in 19s cpu:1m9s avg:3.6 0:01:10 INF [archive] - read 1x(3% 0.5s wait:17s done:1s) 0:01:10 INF [archive] - encode 4x(54% 10s wait:2s done:1s) 0:01:10 INF [archive] - write 1x(21% 4s wait:13s) 0:01:10 INF [archive] - Finished in 1m11s cpu:3m37s gc:1s avg:3.1 0:01:10 INF [archive] - FINISHED! 0:01:10 INF [archive] - 0:01:10 INF [archive] - ---------------------------------------- 0:01:10 INF [archive] - data errors: 0:01:10 INF [archive] - render_snap_fix_input 16,667 0:01:10 INF [archive] - osm_multipolygon_missing_way 360 0:01:10 INF [archive] - osm_boundary_missing_way 73 0:01:10 INF [archive] - merge_snap_fix_input 12 0:01:10 INF [archive] - osm_boundary_duplicate_member 2 0:01:10 INF [archive] - feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix 2 0:01:10 INF [archive] - omt_fix_water_before_ne_intersect 1 0:01:10 INF [archive] - feature_polygon_osm_invalid_multipolygon_empty_after_fix 1 0:01:10 INF [archive] - feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix 1 0:01:10 INF [archive] - ---------------------------------------- 0:01:10 INF [archive] - overall 1m11s cpu:3m37s gc:1s avg:3.1 0:01:10 INF [archive] - lake_centerlines 3s cpu:6s avg:2.1 0:01:10 INF [archive] - read 1x(18% 0.5s done:2s) 0:01:10 INF [archive] - process 4x(0% 0s done:2s) 0:01:10 INF [archive] - write 1x(0% 0s done:2s) 0:01:10 INF [archive] - water_polygons 15s cpu:42s avg:2.8 0:01:10 INF [archive] - read 1x(41% 6s done:6s) 0:01:10 INF [archive] - process 4x(30% 5s wait:2s done:5s) 0:01:10 INF [archive] - write 1x(4% 0.6s wait:10s done:5s) 0:01:10 INF [archive] - natural_earth 12s cpu:18s avg:1.5 0:01:10 INF [archive] - read 1x(52% 6s done:5s) 0:01:10 INF [archive] - process 4x(7% 0.8s wait:6s done:5s) 0:01:10 INF [archive] - write 1x(0% 0s wait:6s done:5s) 0:01:10 INF [archive] - osm_pass1 2s cpu:6s avg:3.1 0:01:10 INF [archive] - read 1x(2% 0s wait:2s) 0:01:10 INF [archive] - parse 4x(33% 0.6s) 0:01:10 INF [archive] - process 1x(72% 1s) 0:01:10 INF [archive] - osm_pass2 18s cpu:1m12s avg:3.9 0:01:10 INF [archive] - read 1x(0% 0s wait:11s done:8s) 0:01:10 INF [archive] - process 4x(76% 14s) 0:01:10 INF [archive] - write 1x(2% 0.4s wait:18s) 0:01:10 INF [archive] - ne_lakes 0s cpu:0s avg:0 0:01:10 INF [archive] - boundaries 0s cpu:0s avg:2.9 0:01:10 INF [archive] - agg_stop 0s cpu:0s avg:0 0:01:10 INF [archive] - sort 1s cpu:4s avg:2.6 0:01:10 INF [archive] - worker 1x(49% 0.7s) 0:01:10 INF [archive] - archive 19s cpu:1m9s avg:3.6 0:01:10 INF [archive] - read 1x(3% 0.5s wait:17s done:1s) 0:01:10 INF [archive] - encode 4x(54% 10s wait:2s done:1s) 0:01:10 INF [archive] - write 1x(21% 4s wait:13s) 0:01:10 INF [archive] - ---------------------------------------- 0:01:10 INF [archive] - archive 108MB 0:01:10 INF [archive] - features 281MB -rw-r--r-- 1 runner docker 84M Jun 27 11:14 run.jar ``` ``` 0:01:04 DEB [archive] - Tile stats: 0:01:04 DEB [archive] - Biggest tiles (gzipped) 1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k) 2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k) 3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k) 4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k) 5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k) 6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k) 7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k) 8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k) 9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k) 10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k) 0:01:04 DEB [archive] - Max tile sizes z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all boundary 154 374 443 583 938 339 433 548 773 1.6k 2.1k 7.2k 6.4k 5.8k 4.5k 7.2k water 7.7k 3.7k 8.6k 5.5k 2.6k 5.1k 15k 18k 16k 25k 15k 13k 17k 15k 12k 25k place 0 0 441 441 441 639 712 1k 1.5k 3.1k 5.6k 3.3k 1.7k 795 936 5.6k landuse 0 0 0 0 548 694 1.6k 6.8k 17k 44k 59k 50k 38k 19k 12k 59k transportation 0 0 0 0 243 782 1.2k 5.9k 8k 24k 17k 19k 65k 48k 34k 65k waterway 0 0 0 0 111 118 0 0 0 3.1k 2.4k 2.1k 2.1k 4.9k 2.4k 4.9k park 0 0 0 0 0 0 1.2k 4k 9.7k 19k 13k 8.2k 4.3k 3.4k 4.4k 19k transportation_name 0 0 0 0 0 0 369 464 1.2k 1.8k 5.4k 4.6k 3.9k 3.4k 18k 18k landcover 0 0 0 0 0 0 0 9.5k 29k 85k 72k 81k 53k 30k 24k 85k mountain_peak 0 0 0 0 0 0 0 1.1k 1.8k 3.4k 4.3k 2.8k 1.4k 1.4k 869 4.3k water_name 0 0 0 0 0 0 0 0 0 486 461 433 452 1.2k 1.5k 1.5k aerodrome_label 0 0 0 0 0 0 0 0 0 0 664 327 273 220 220 664 aeroway 0 0 0 0 0 0 0 0 0 0 1.6k 2.1k 3k 3.4k 2.7k 3.4k poi 0 0 0 0 0 0 0 0 0 0 0 0 501 498 83k 83k building 0 0 0 0 0 0 0 0 0 0 0 0 0 59k 92k 92k housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35k 35k full tile 7.9k 4k 9.5k 6.5k 3.7k 6k 20k 42k 85k 203k 185k 135k 114k 128k 244k 244k gzipped 6.2k 3.5k 7.1k 5.2k 3.1k 4.8k 14k 29k 60k 149k 138k 98k 83k 92k 154k 154k 0:01:04 DEB [archive] - Max tile: 244k (gzipped: 154k) 0:01:04 DEB [archive] - Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic 0:01:04 DEB [archive] - # tiles: 4,115,036 0:01:04 DEB [archive] - # features: 5,487,099 0:01:04 INF [archive] - Finished in 19s cpu:1m9s avg:3.6 0:01:04 INF [archive] - read 1x(3% 0.5s wait:17s) 0:01:04 INF [archive] - encode 4x(55% 10s wait:2s) 0:01:04 INF [archive] - write 1x(21% 4s wait:13s) 0:01:04 INF [archive] - Finished in 1m4s cpu:3m31s gc:1s avg:3.3 0:01:04 INF [archive] - FINISHED! 0:01:04 INF [archive] - 0:01:04 INF [archive] - ---------------------------------------- 0:01:04 INF [archive] - data errors: 0:01:04 INF [archive] - render_snap_fix_input 16,667 0:01:04 INF [archive] - osm_multipolygon_missing_way 360 0:01:04 INF [archive] - osm_boundary_missing_way 73 0:01:04 INF [archive] - merge_snap_fix_input 12 0:01:04 INF [archive] - osm_boundary_duplicate_member 2 0:01:04 INF [archive] - feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix 2 0:01:04 INF [archive] - omt_fix_water_before_ne_intersect 1 0:01:04 INF [archive] - feature_polygon_osm_invalid_multipolygon_empty_after_fix 1 0:01:04 INF [archive] - feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix 1 0:01:04 INF [archive] - ---------------------------------------- 0:01:04 INF [archive] - overall 1m4s cpu:3m31s gc:1s avg:3.3 0:01:04 INF [archive] - lake_centerlines 2s cpu:5s avg:2.4 0:01:04 INF [archive] - read 1x(22% 0.5s done:2s) 0:01:04 INF [archive] - process 4x(0% 0s done:2s) 0:01:04 INF [archive] - write 1x(0% 0s done:2s) 0:01:04 INF [archive] - water_polygons 15s cpu:42s avg:2.8 0:01:04 INF [archive] - read 1x(40% 6s done:6s) 0:01:04 INF [archive] - process 4x(29% 4s wait:3s done:5s) 0:01:04 INF [archive] - write 1x(4% 0.6s wait:10s done:5s) 0:01:04 INF [archive] - natural_earth 6s cpu:12s avg:2 0:01:04 INF [archive] - read 1x(96% 6s) 0:01:04 INF [archive] - process 4x(13% 0.8s wait:6s) 0:01:04 INF [archive] - write 1x(0% 0s wait:6s) 0:01:04 INF [archive] - osm_pass1 2s cpu:6s avg:3.3 0:01:04 INF [archive] - read 1x(2% 0s wait:2s) 0:01:04 INF [archive] - parse 4x(34% 0.6s) 0:01:04 INF [archive] - process 1x(68% 1s) 0:01:04 INF [archive] - osm_pass2 18s cpu:1m11s avg:3.9 0:01:04 INF [archive] - read 1x(0% 0s wait:11s done:8s) 0:01:04 INF [archive] - process 4x(76% 14s) 0:01:04 INF [archive] - write 1x(2% 0.4s wait:18s) 0:01:04 INF [archive] - ne_lakes 0s cpu:0s avg:0 0:01:04 INF [archive] - boundaries 0s cpu:0s avg:2.6 0:01:04 INF [archive] - agg_stop 0s cpu:0s avg:0 0:01:04 INF [archive] - sort 1s cpu:3s avg:2.6 0:01:04 INF [archive] - worker 1x(49% 0.7s) 0:01:04 INF [archive] - archive 19s cpu:1m9s avg:3.6 0:01:04 INF [archive] - read 1x(3% 0.5s wait:17s) 0:01:04 INF [archive] - encode 4x(55% 10s wait:2s) 0:01:04 INF [archive] - write 1x(21% 4s wait:13s) 0:01:04 INF [archive] - ---------------------------------------- 0:01:04 INF [archive] - archive 108MB 0:01:04 INF [archive] - features 281MB -rw-r--r-- 1 runner docker 84M Jun 27 11:16 run.jar ```

Full logs: https://github.com/onthegomap/planetiler/actions/runs/9695530807