onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.2k stars 100 forks source link

Avoid deserializing entire parquet geometry just to determine type #898

Closed msbarry closed 1 month ago

msbarry commented 1 month ago

Slight optimization to attempt to parse the geometry type from the beginning of a WKB or WKT-encoded geometry without deserializing the whole thing. If that fails, it falls back to the old behavior of deserializing and checking the type.

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
58.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions[bot] commented 1 month ago
This Branch c9455b880c1f56642d88c0bb34f15059e3138363 Base f8e64a4beba00fc881ed8ca01a9475f8dd14ad5a
``` 0:01:08 DEB [archive] - Tile stats: 0:01:08 DEB [archive] - Biggest tiles (gzipped) 1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k) 2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k) 3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k) 4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k) 5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k) 6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k) 7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k) 8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k) 9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k) 10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k) 0:01:08 DEB [archive] - Max tile sizes z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all boundary 154 374 443 583 938 339 433 548 773 1.6k 2.1k 7.2k 6.4k 5.8k 4.5k 7.2k water 7.7k 3.7k 8.6k 5.5k 2.6k 5.1k 15k 18k 16k 26k 15k 13k 17k 15k 12k 26k place 0 0 441 441 441 639 712 1k 1.5k 3.1k 5.6k 3.3k 1.7k 795 936 5.6k landuse 0 0 0 0 548 694 1.6k 6.8k 17k 44k 59k 50k 38k 19k 12k 59k transportation 0 0 0 0 243 782 1.2k 5.9k 8k 24k 17k 19k 65k 48k 33k 65k waterway 0 0 0 0 111 118 0 0 0 3.1k 2.4k 2.1k 2.1k 4.9k 2.4k 4.9k park 0 0 0 0 0 0 1k 3.7k 9.7k 19k 13k 8.2k 4.3k 3.4k 4.4k 19k transportation_name 0 0 0 0 0 0 369 464 1.2k 1.8k 5.4k 4.6k 3.9k 3.4k 18k 18k landcover 0 0 0 0 0 0 0 9.5k 29k 85k 72k 81k 53k 30k 24k 85k mountain_peak 0 0 0 0 0 0 0 1.1k 1.8k 3.4k 4.3k 2.8k 1.4k 1.4k 869 4.3k water_name 0 0 0 0 0 0 0 0 0 486 461 433 452 1.2k 1.5k 1.5k aerodrome_label 0 0 0 0 0 0 0 0 0 0 664 327 273 220 220 664 aeroway 0 0 0 0 0 0 0 0 0 0 1.6k 2.1k 3k 3.4k 2.7k 3.4k poi 0 0 0 0 0 0 0 0 0 0 0 0 501 498 83k 83k building 0 0 0 0 0 0 0 0 0 0 0 0 0 59k 92k 92k housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35k 35k full tile 7.9k 4k 9.5k 6.5k 3.7k 6k 20k 42k 85k 203k 185k 135k 114k 128k 244k 244k gzipped 6.2k 3.5k 7.1k 5.2k 3.1k 4.8k 14k 29k 60k 149k 138k 98k 83k 91k 154k 154k 0:01:08 DEB [archive] - Max tile: 244k (gzipped: 154k) 0:01:08 DEB [archive] - Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic 0:01:08 DEB [archive] - # tiles: 4,115,012 0:01:08 DEB [archive] - # features: 5,484,250 0:01:08 INF [archive] - Finished in 18s cpu:1m6s avg:3.6 0:01:08 INF [archive] - read 1x(3% 0.5s wait:16s) 0:01:08 INF [archive] - encode 4x(56% 10s wait:2s) 0:01:08 INF [archive] - write 1x(22% 4s wait:12s) 0:01:08 INF [archive] - Finished in 1m8s cpu:3m30s gc:1s avg:3.1 0:01:08 INF [archive] - FINISHED! 0:01:08 INF [archive] - 0:01:08 INF [archive] - ---------------------------------------- 0:01:08 INF [archive] - data errors: 0:01:08 INF [archive] - render_snap_fix_input 16,639 0:01:08 INF [archive] - osm_multipolygon_missing_way 389 0:01:08 INF [archive] - osm_boundary_missing_way 73 0:01:08 INF [archive] - merge_snap_fix_input 12 0:01:08 INF [archive] - osm_boundary_duplicate_member 2 0:01:08 INF [archive] - feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix 2 0:01:08 INF [archive] - feature_polygon_osm_invalid_multipolygon_empty_after_fix 2 0:01:08 INF [archive] - omt_park_area_osm_invalid_multipolygon_empty_after_fix 1 0:01:08 INF [archive] - omt_fix_water_before_ne_intersect 1 0:01:08 INF [archive] - feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix 1 0:01:08 INF [archive] - ---------------------------------------- 0:01:08 INF [archive] - overall 1m8s cpu:3m30s gc:1s avg:3.1 0:01:08 INF [archive] - lake_centerlines 3s cpu:5s avg:2 0:01:08 INF [archive] - read 1x(18% 0.5s done:2s) 0:01:08 INF [archive] - process 4x(0% 0s done:2s) 0:01:08 INF [archive] - write 1x(0% 0s done:2s) 0:01:08 INF [archive] - water_polygons 15s cpu:42s avg:2.8 0:01:08 INF [archive] - read 1x(43% 6s done:7s) 0:01:08 INF [archive] - process 4x(27% 4s wait:4s done:5s) 0:01:08 INF [archive] - write 1x(4% 0.5s wait:10s done:5s) 0:01:08 INF [archive] - natural_earth 12s cpu:18s avg:1.5 0:01:08 INF [archive] - read 1x(52% 6s done:5s) 0:01:08 INF [archive] - process 4x(7% 0.8s wait:6s done:5s) 0:01:08 INF [archive] - write 1x(0% 0s wait:6s done:5s) 0:01:08 INF [archive] - osm_pass1 2s cpu:7s avg:3.4 0:01:08 INF [archive] - read 1x(2% 0s wait:2s) 0:01:08 INF [archive] - parse 4x(34% 0.7s) 0:01:08 INF [archive] - process 1x(69% 1s) 0:01:08 INF [archive] - osm_pass2 17s cpu:1m7s avg:3.9 0:01:08 INF [archive] - read 1x(0% 0s wait:10s done:8s) 0:01:08 INF [archive] - process 4x(75% 13s) 0:01:08 INF [archive] - write 1x(2% 0.4s wait:17s) 0:01:08 INF [archive] - ne_lakes 0s cpu:0s avg:0 0:01:08 INF [archive] - boundaries 0s cpu:0s avg:2.4 0:01:08 INF [archive] - agg_stop 0s cpu:0s avg:0 0:01:08 INF [archive] - sort 1s cpu:4s avg:2.7 0:01:08 INF [archive] - worker 1x(48% 0.7s) 0:01:08 INF [archive] - archive 18s cpu:1m6s avg:3.6 0:01:08 INF [archive] - read 1x(3% 0.5s wait:16s) 0:01:08 INF [archive] - encode 4x(56% 10s wait:2s) 0:01:08 INF [archive] - write 1x(22% 4s wait:12s) 0:01:08 INF [archive] - ---------------------------------------- 0:01:08 INF [archive] - archive 108MB 0:01:08 INF [archive] - features 281MB -rw-r--r-- 1 runner docker 84M May 26 10:55 run.jar ``` ``` 0:01:02 DEB [archive] - Tile stats: 0:01:02 DEB [archive] - Biggest tiles (gzipped) 1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k) 2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k) 3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k) 4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k) 5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k) 6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k) 7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k) 8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k) 9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k) 10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k) 0:01:02 DEB [archive] - Max tile sizes z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all boundary 154 374 443 583 938 339 433 548 773 1.6k 2.1k 7.2k 6.4k 5.8k 4.5k 7.2k water 7.7k 3.7k 8.6k 5.5k 2.6k 5.1k 15k 18k 16k 26k 15k 13k 17k 15k 12k 26k place 0 0 441 441 441 639 712 1k 1.5k 3.1k 5.6k 3.3k 1.7k 795 936 5.6k landuse 0 0 0 0 548 694 1.6k 6.8k 17k 44k 59k 50k 38k 19k 12k 59k transportation 0 0 0 0 243 782 1.2k 5.9k 8k 24k 17k 19k 65k 48k 33k 65k waterway 0 0 0 0 111 118 0 0 0 3.1k 2.4k 2.1k 2.1k 4.9k 2.4k 4.9k park 0 0 0 0 0 0 1k 3.7k 9.7k 19k 13k 8.2k 4.3k 3.4k 4.4k 19k transportation_name 0 0 0 0 0 0 369 464 1.2k 1.8k 5.4k 4.6k 3.9k 3.4k 18k 18k landcover 0 0 0 0 0 0 0 9.5k 29k 85k 72k 81k 53k 30k 24k 85k mountain_peak 0 0 0 0 0 0 0 1.1k 1.8k 3.4k 4.3k 2.8k 1.4k 1.4k 869 4.3k water_name 0 0 0 0 0 0 0 0 0 486 461 433 452 1.2k 1.5k 1.5k aerodrome_label 0 0 0 0 0 0 0 0 0 0 664 327 273 220 220 664 aeroway 0 0 0 0 0 0 0 0 0 0 1.6k 2.1k 3k 3.4k 2.7k 3.4k poi 0 0 0 0 0 0 0 0 0 0 0 0 501 498 83k 83k building 0 0 0 0 0 0 0 0 0 0 0 0 0 59k 92k 92k housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35k 35k full tile 7.9k 4k 9.5k 6.5k 3.7k 6k 20k 42k 85k 203k 185k 135k 114k 128k 244k 244k gzipped 6.2k 3.5k 7.1k 5.2k 3.1k 4.8k 14k 29k 60k 149k 138k 98k 83k 91k 154k 154k 0:01:02 DEB [archive] - Max tile: 244k (gzipped: 154k) 0:01:02 DEB [archive] - Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic 0:01:02 DEB [archive] - # tiles: 4,115,012 0:01:02 DEB [archive] - # features: 5,484,250 0:01:02 INF [archive] - Finished in 18s cpu:1m6s avg:3.7 0:01:02 INF [archive] - read 1x(3% 0.6s wait:16s done:1s) 0:01:02 INF [archive] - encode 4x(55% 10s wait:2s done:1s) 0:01:02 INF [archive] - write 1x(22% 4s wait:12s) 0:01:02 INF [archive] - Finished in 1m3s cpu:3m24s gc:1s avg:3.3 0:01:02 INF [archive] - FINISHED! 0:01:02 INF [archive] - 0:01:02 INF [archive] - ---------------------------------------- 0:01:02 INF [archive] - data errors: 0:01:02 INF [archive] - render_snap_fix_input 16,639 0:01:02 INF [archive] - osm_multipolygon_missing_way 389 0:01:02 INF [archive] - osm_boundary_missing_way 73 0:01:02 INF [archive] - merge_snap_fix_input 12 0:01:02 INF [archive] - osm_boundary_duplicate_member 2 0:01:02 INF [archive] - feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix 2 0:01:02 INF [archive] - feature_polygon_osm_invalid_multipolygon_empty_after_fix 2 0:01:02 INF [archive] - omt_park_area_osm_invalid_multipolygon_empty_after_fix 1 0:01:02 INF [archive] - omt_fix_water_before_ne_intersect 1 0:01:02 INF [archive] - feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix 1 0:01:02 INF [archive] - ---------------------------------------- 0:01:02 INF [archive] - overall 1m3s cpu:3m24s gc:1s avg:3.3 0:01:02 INF [archive] - lake_centerlines 2s cpu:5s avg:2.3 0:01:02 INF [archive] - read 1x(21% 0.5s done:2s) 0:01:02 INF [archive] - process 4x(0% 0s done:2s) 0:01:02 INF [archive] - write 1x(0% 0s done:2s) 0:01:02 INF [archive] - water_polygons 15s cpu:41s avg:2.8 0:01:02 INF [archive] - read 1x(43% 6s done:7s) 0:01:02 INF [archive] - process 4x(26% 4s wait:4s done:5s) 0:01:02 INF [archive] - write 1x(4% 0.5s wait:10s done:5s) 0:01:02 INF [archive] - natural_earth 6s cpu:11s avg:1.9 0:01:02 INF [archive] - read 1x(96% 6s) 0:01:02 INF [archive] - process 4x(13% 0.8s wait:6s) 0:01:02 INF [archive] - write 1x(0% 0s wait:6s) 0:01:02 INF [archive] - osm_pass1 2s cpu:6s avg:3.3 0:01:02 INF [archive] - read 1x(2% 0s wait:2s) 0:01:02 INF [archive] - parse 4x(34% 0.6s) 0:01:02 INF [archive] - process 1x(67% 1s) 0:01:02 INF [archive] - osm_pass2 18s cpu:1m10s avg:3.9 0:01:02 INF [archive] - read 1x(0% 0s wait:10s done:8s) 0:01:02 INF [archive] - process 4x(74% 13s) 0:01:02 INF [archive] - write 1x(2% 0.4s wait:17s) 0:01:02 INF [archive] - ne_lakes 0s cpu:0s avg:0 0:01:02 INF [archive] - boundaries 0s cpu:0s avg:1.4 0:01:02 INF [archive] - agg_stop 0s cpu:0s avg:0 0:01:02 INF [archive] - sort 1s cpu:3s avg:2.6 0:01:02 INF [archive] - worker 1x(52% 0.7s) 0:01:02 INF [archive] - archive 18s cpu:1m6s avg:3.7 0:01:02 INF [archive] - read 1x(3% 0.6s wait:16s done:1s) 0:01:02 INF [archive] - encode 4x(55% 10s wait:2s done:1s) 0:01:02 INF [archive] - write 1x(22% 4s wait:12s) 0:01:02 INF [archive] - ---------------------------------------- 0:01:02 INF [archive] - archive 108MB 0:01:02 INF [archive] - features 281MB -rw-r--r-- 1 runner docker 84M May 26 10:56 run.jar ```

Full logs: https://github.com/onthegomap/planetiler/actions/runs/9242696066