onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.46k stars 116 forks source link

Add loop line merger #1083

Closed wipfli closed 3 days ago

wipfli commented 3 weeks ago

This pull request adds a LoopLineMerger class which is a replacement for JTS LineMerger . The new class takes loops in the LineString graph into account. Loops that are shorter than a minLoopLength will be cut open and only the shortest edge will be kept. Here is a bit of motivation: https://oliverwipfli.ch/improving-linestring-merging-in-planetiler-2024-10-30/ Questions from my side are:

github-actions[bot] commented 3 weeks ago
This Branch d5a811d14abecdefcfa8f173c249363e6f07a4ea Base b59c63e5aa79c9d62d0ec24d00b03e8fd310872f
``` 0:01:13 DEB [archive] - Tile stats: 0:01:13 DEB [archive] - Biggest tiles (gzipped) 1. 14/4942/6092 (157k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:85k) 2. 9/154/190 (144k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k) 3. 10/308/380 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k) 4. 10/308/381 (135k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k) 5. 14/4941/6092 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k) 6. 14/4941/6093 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k) 7. 14/4940/6092 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k) 8. 11/616/762 (99k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k) 9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k) 10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k) 0:01:13 DEB [archive] - Max tile sizes z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all boundary 151 336 409 544 872 332 437 552 802 1.6k 2k 6.9k 6.2k 5.6k 4.5k 6.9k water 7.7k 3.7k 8.6k 5.5k 2.6k 5.1k 15k 18k 16k 26k 15k 13k 17k 15k 12k 26k place 0 0 441 441 441 640 714 1k 1.6k 3.1k 5.7k 3.3k 1.7k 803 948 5.7k landuse 0 0 0 0 549 695 1.6k 6.7k 17k 44k 59k 50k 38k 19k 12k 59k transportation 0 0 0 0 311 774 1.2k 4k 5.6k 17k 13k 17k 62k 47k 33k 62k waterway 0 0 0 0 112 119 0 0 0 3k 2.3k 2k 2.1k 4.9k 2.4k 4.9k park 0 0 0 0 0 0 1.3k 4.3k 9.7k 18k 13k 8.2k 3.7k 3.4k 4.4k 18k transportation_name 0 0 0 0 0 0 287 364 1.1k 1.9k 5.5k 4.7k 3.9k 3.4k 18k 18k landcover 0 0 0 0 0 0 0 9.6k 29k 85k 72k 81k 53k 30k 25k 85k mountain_peak 0 0 0 0 0 0 0 1.1k 1.8k 3.4k 4.3k 2.8k 1.4k 1.4k 869 4.3k water_name 0 0 0 0 0 0 0 0 0 486 461 433 452 1.2k 1.5k 1.5k aerodrome_label 0 0 0 0 0 0 0 0 0 0 666 328 273 221 221 666 aeroway 0 0 0 0 0 0 0 0 0 0 1.6k 2.1k 3k 3.4k 2.8k 3.4k poi 0 0 0 0 0 0 0 0 0 0 0 0 568 565 85k 85k building 0 0 0 0 0 0 0 0 0 0 0 0 0 59k 92k 92k housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35k 35k full tile 7.9k 4k 9.5k 6.4k 3.7k 6k 20k 40k 82k 195k 182k 135k 113k 128k 247k 247k gzipped 6.2k 3.5k 7.1k 5.2k 3.1k 4.8k 14k 28k 59k 144k 136k 99k 83k 92k 157k 157k 0:01:13 DEB [archive] - Max tile: 247k (gzipped: 157k) 0:01:13 DEB [archive] - Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic 0:01:13 DEB [archive] - # tiles: 4,115,031 0:01:13 DEB [archive] - # features: 5,518,001 0:01:13 INF [archive] - Finished in 19s cpu:1m10s avg:3.7 0:01:13 INF [archive] - read 1x(3% 0.5s wait:17s done:1s) 0:01:13 INF [archive] - encode 4x(56% 11s wait:2s done:1s) 0:01:13 INF [archive] - write 1x(23% 4s wait:13s done:1s) 0:01:13 INF [archive] - Finished in 1m14s cpu:3m39s gc:1s avg:3 0:01:13 INF [archive] - FINISHED! 0:01:13 INF [archive] - 0:01:13 INF [archive] - ---------------------------------------- 0:01:13 INF [archive] - data errors: 0:01:13 INF [archive] - render_snap_fix_input 16,675 0:01:13 INF [archive] - osm_multipolygon_missing_way 360 0:01:13 INF [archive] - osm_boundary_missing_way 66 0:01:13 INF [archive] - merge_snap_fix_input 12 0:01:13 INF [archive] - feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix 2 0:01:13 INF [archive] - render_snap_fix_input2 1 0:01:13 INF [archive] - omt_fix_water_before_ne_intersect 1 0:01:13 INF [archive] - feature_polygon_osm_invalid_multipolygon_empty_after_fix 1 0:01:13 INF [archive] - feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix 1 0:01:13 INF [archive] - ---------------------------------------- 0:01:13 INF [archive] - overall 1m14s cpu:3m39s gc:1s avg:3 0:01:13 INF [archive] - lake_centerlines 5s cpu:6s avg:1.2 0:01:13 INF [archive] - read 1x(9% 0.5s done:5s) 0:01:13 INF [archive] - process 4x(0% 0s done:4s) 0:01:13 INF [archive] - write 1x(0% 0s done:4s) 0:01:13 INF [archive] - water_polygons 15s cpu:40s avg:2.7 0:01:13 INF [archive] - read 1x(41% 6s done:7s) 0:01:13 INF [archive] - process 4x(26% 4s wait:4s done:5s) 0:01:13 INF [archive] - write 1x(4% 0.5s wait:9s done:5s) 0:01:13 INF [archive] - natural_earth 12s cpu:18s avg:1.6 0:01:13 INF [archive] - read 1x(52% 6s done:5s) 0:01:13 INF [archive] - process 4x(7% 0.8s wait:6s done:5s) 0:01:13 INF [archive] - write 1x(0% 0s wait:6s done:5s) 0:01:13 INF [archive] - osm_pass1 2s cpu:7s avg:3.4 0:01:13 INF [archive] - read 1x(2% 0s wait:2s) 0:01:13 INF [archive] - parse 4x(33% 0.7s) 0:01:13 INF [archive] - process 1x(72% 2s) 0:01:13 INF [archive] - osm_pass2 19s cpu:1m12s avg:3.9 0:01:13 INF [archive] - read 1x(0% 0s wait:11s done:8s) 0:01:13 INF [archive] - process 4x(75% 14s) 0:01:13 INF [archive] - write 1x(2% 0.4s wait:18s) 0:01:13 INF [archive] - ne_lakes 0s cpu:0s avg:0 0:01:13 INF [archive] - boundaries 0s cpu:0s avg:2.7 0:01:13 INF [archive] - agg_stop 0s cpu:0s avg:0 0:01:13 INF [archive] - sort 1s cpu:4s avg:2.7 0:01:13 INF [archive] - worker 1x(48% 0.7s) 0:01:13 INF [archive] - archive 19s cpu:1m10s avg:3.7 0:01:13 INF [archive] - read 1x(3% 0.5s wait:17s done:1s) 0:01:13 INF [archive] - encode 4x(56% 11s wait:2s done:1s) 0:01:13 INF [archive] - write 1x(23% 4s wait:13s done:1s) 0:01:13 INF [archive] - ---------------------------------------- 0:01:13 INF [archive] - archive 108MB 0:01:13 INF [archive] - features 284MB -rw-r--r-- 1 runner docker 87M Nov 25 01:49 run.jar ``` ``` 0:01:04 DEB [archive] - Tile stats: 0:01:04 DEB [archive] - Biggest tiles (gzipped) 1. 14/4942/6092 (159k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:85k) 2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k) 3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k) 4. 10/308/381 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k) 5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k) 6. 14/4941/6093 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k) 7. 14/4940/6092 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k) 8. 11/616/762 (99k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k) 9. 14/4942/6091 (97k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k) 10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k) 0:01:04 DEB [archive] - Max tile sizes z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z11 z12 z13 z14 all boundary 155 375 444 584 939 371 467 587 823 1.7k 2.1k 7.2k 6.4k 5.8k 4.5k 7.2k water 7.7k 3.7k 8.6k 5.5k 2.6k 5.1k 15k 18k 16k 26k 15k 13k 17k 15k 12k 26k place 0 0 441 441 441 640 714 1k 1.6k 3.1k 5.7k 3.3k 1.7k 803 948 5.7k landuse 0 0 0 0 549 695 1.6k 6.7k 17k 44k 59k 50k 38k 19k 12k 59k transportation 0 0 0 0 370 905 1.3k 6k 8.1k 25k 17k 20k 65k 49k 36k 65k waterway 0 0 0 0 112 119 0 0 0 3.2k 2.3k 2.1k 2.1k 4.9k 2.4k 4.9k park 0 0 0 0 0 0 1.3k 4.3k 9.7k 18k 13k 8.2k 3.7k 3.4k 4.4k 18k transportation_name 0 0 0 0 0 0 369 464 1.2k 1.8k 5.5k 4.7k 3.9k 3.4k 18k 18k landcover 0 0 0 0 0 0 0 9.6k 29k 85k 72k 81k 53k 30k 25k 85k mountain_peak 0 0 0 0 0 0 0 1.1k 1.8k 3.4k 4.3k 2.8k 1.4k 1.4k 869 4.3k water_name 0 0 0 0 0 0 0 0 0 486 461 433 452 1.2k 1.5k 1.5k aerodrome_label 0 0 0 0 0 0 0 0 0 0 666 328 273 221 221 666 aeroway 0 0 0 0 0 0 0 0 0 0 1.6k 2.1k 3k 3.4k 2.8k 3.4k poi 0 0 0 0 0 0 0 0 0 0 0 0 568 565 85k 85k building 0 0 0 0 0 0 0 0 0 0 0 0 0 59k 92k 92k housenumber 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35k 35k full tile 7.9k 4k 9.5k 6.5k 3.8k 6.2k 21k 42k 85k 203k 185k 135k 114k 129k 250k 250k gzipped 6.2k 3.6k 7.1k 5.2k 3.1k 5k 14k 30k 60k 149k 138k 99k 83k 92k 159k 159k 0:01:04 DEB [archive] - Max tile: 250k (gzipped: 159k) 0:01:04 DEB [archive] - Avg tile: 5.5k (gzipped: 4.1k) using weighted average based on OSM traffic 0:01:04 DEB [archive] - # tiles: 4,115,031 0:01:04 DEB [archive] - # features: 5,518,001 0:01:04 INF [archive] - Finished in 19s cpu:1m10s avg:3.7 0:01:04 INF [archive] - read 1x(3% 0.6s wait:17s done:1s) 0:01:04 INF [archive] - encode 4x(57% 11s wait:2s done:1s) 0:01:04 INF [archive] - write 1x(23% 4s wait:12s) 0:01:04 INF [archive] - Finished in 1m4s cpu:3m31s gc:1s avg:3.3 0:01:04 INF [archive] - FINISHED! 0:01:04 INF [archive] - 0:01:04 INF [archive] - ---------------------------------------- 0:01:04 INF [archive] - data errors: 0:01:04 INF [archive] - render_snap_fix_input 16,675 0:01:04 INF [archive] - osm_multipolygon_missing_way 360 0:01:04 INF [archive] - osm_boundary_missing_way 66 0:01:04 INF [archive] - merge_snap_fix_input 12 0:01:04 INF [archive] - feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix 2 0:01:04 INF [archive] - render_snap_fix_input2 1 0:01:04 INF [archive] - omt_fix_water_before_ne_intersect 1 0:01:04 INF [archive] - feature_polygon_osm_invalid_multipolygon_empty_after_fix 1 0:01:04 INF [archive] - feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix 1 0:01:04 INF [archive] - ---------------------------------------- 0:01:04 INF [archive] - overall 1m4s cpu:3m31s gc:1s avg:3.3 0:01:04 INF [archive] - lake_centerlines 2s cpu:5s avg:2.3 0:01:04 INF [archive] - read 1x(22% 0.5s done:2s) 0:01:04 INF [archive] - process 4x(0% 0s done:2s) 0:01:04 INF [archive] - write 1x(0% 0s done:1s) 0:01:04 INF [archive] - water_polygons 15s cpu:41s avg:2.8 0:01:04 INF [archive] - read 1x(40% 6s done:7s) 0:01:04 INF [archive] - process 4x(26% 4s wait:4s done:5s) 0:01:04 INF [archive] - write 1x(4% 0.5s wait:10s done:5s) 0:01:04 INF [archive] - natural_earth 6s cpu:13s avg:2 0:01:04 INF [archive] - read 1x(95% 6s) 0:01:04 INF [archive] - process 4x(13% 0.8s wait:6s) 0:01:04 INF [archive] - write 1x(0% 0s wait:6s) 0:01:04 INF [archive] - osm_pass1 2s cpu:6s avg:3.2 0:01:04 INF [archive] - read 1x(2% 0s wait:2s) 0:01:04 INF [archive] - parse 4x(35% 0.7s) 0:01:04 INF [archive] - process 1x(70% 1s) 0:01:04 INF [archive] - osm_pass2 18s cpu:1m11s avg:3.9 0:01:04 INF [archive] - read 1x(0% 0s wait:10s done:8s) 0:01:04 INF [archive] - process 4x(76% 14s) 0:01:04 INF [archive] - write 1x(2% 0.4s wait:18s) 0:01:04 INF [archive] - ne_lakes 0s cpu:0s avg:0 0:01:04 INF [archive] - boundaries 0s cpu:0s avg:2 0:01:04 INF [archive] - agg_stop 0s cpu:0s avg:18.3 0:01:04 INF [archive] - sort 1s cpu:3s avg:2.5 0:01:04 INF [archive] - worker 1x(52% 0.7s) 0:01:04 INF [archive] - archive 19s cpu:1m10s avg:3.7 0:01:04 INF [archive] - read 1x(3% 0.6s wait:17s done:1s) 0:01:04 INF [archive] - encode 4x(57% 11s wait:2s done:1s) 0:01:04 INF [archive] - write 1x(23% 4s wait:12s) 0:01:04 INF [archive] - ---------------------------------------- 0:01:04 INF [archive] - archive 108MB 0:01:04 INF [archive] - features 284MB -rw-r--r-- 1 runner docker 86M Nov 25 01:51 run.jar ```

Full logs: https://github.com/onthegomap/planetiler/actions/runs/12001709762

wipfli commented 3 weeks ago

I added some tests for the public functions. I don't really know how private functions are tested in Java. Should I for example write specific tests for the private function findAllPaths?

msbarry commented 3 weeks ago

It really only needs tests for the public api. If there's an internal method you wanted to make sure works the way you expected you can make it package protected (no modifier) so tests can use it since they're in the same package.

msbarry commented 3 weeks ago

How should we expose the new parameter minLoopLength? At the moment I just set it to lengthLimit from FeatureMerge.mergeLineStrings. Should we maybe add an optional loopLengthLimitCalculator?

I think something like this would make sense for an API to control those parameters:

var merger = new LoopLineMerger()
  .setMinLength(0.1)
  .setLoopMinLength(0.1)
  .setPrecision(new PrecisionModel(16)); // to control rounding
merger.add(lineStrings);
var merged = merger.getMergedLineStrings();
msbarry commented 3 weeks ago

LoopLineMerger is slower than JTS LineMerger. I guess it is because they are doing less (no loops taken into account) but also they are doing things more efficiently.

I can help look into the performance, I think there will be some low hanging fruit to make it fast. JTS line merge also only considers full linestrings, it doesn't break them up into segments so this one will always be a little slower. The first thing that jumps out at me is creating full LineStrings and reversing/comparing them while building up the lists. I think we might want to have the intermediate code work with lists of coordinates that can be concatenated efficiently, then construct the LineStrings at the very end.

wipfli commented 3 weeks ago

Great, thanks for the feedback! I added some more tests and marked the methods that I want to test separately as protected.

msbarry commented 3 weeks ago

I added a BenchmarkLineMerge class we can use to compare performance of JTS line merge vs. this new utility.

wipfli commented 3 weeks ago

Thanks for adding the benchmarks. I changed the api just now. Let me have a look at the benchmark...

msbarry commented 3 weeks ago

Pushed a few updates to the benchmark, basically it creates between 10 and 1000 line segments in the range from (0,0) to (100, 100) with between 10 and 1000 intermediate points then uses both line merge. I disabled the 1000 lines with 1000 points because it currently does not finish with loop line merger 😬

Let me know if you can think of more realistic scenarios?

wipfli commented 3 weeks ago

I had a look at what is given to line merging if one wants to render highway=primary at z6 for Switzerland. The result is that you get many short linestrings:

groupedFeatures.size  50043
0 <= points per linestring < 5 is 52.04124452970446 percent
5 <= points per linestring < 10 is 26.543172871330654 percent
10 <= points per linestring < 15 is 8.292868133405271 percent
15 <= points per linestring < 20 is 4.248346422077014 percent
20 <= points per linestring < 25 is 3.6428671342645327 percent
25 <= points per linestring < 30 is 1.6805547229382731 percent
30 <= points per linestring < 35 is 0.8912335391563256 percent
35 <= points per linestring < 40 is 0.6374517914593449 percent
40 <= points per linestring < 45 is 0.44961333253402075 percent
45 <= points per linestring < 50 is 0.30174050316727613 percent
50 <= points per linestring < 55 is 0.24978518474112263 percent
55 <= points per linestring < 60 is 0.14587454788881563 percent
60 <= points per linestring < 65 is 0.10590891833023598 percent
65 <= points per linestring < 70 is 0.08992266650680415 percent
70 <= points per linestring < 75 is 0.10191235537437802 percent
75 <= points per linestring < 80 is 0.07193813320544332 percent
80 <= points per linestring < 85 is 0.06394500729372739 percent
85 <= points per linestring < 90 is 0.055951881382011466 percent
90 <= points per linestring < 95 is 0.03397078512479268 percent
wipfli commented 3 weeks ago

I guess these statistics reflect that in OpenStreetMap a typical Way has only a handful of Nodes.

Regarding the length of the linestrings, actually almost half of them are too short to occupy distinct start and end points on the 1/16 grid:

0.0 <= linestring length < 0.03125 is 45.149171712327394 percent
0.03125 <= linestring length < 0.0625 is 21.459544791479328 percent
0.0625 <= linestring length < 0.09375 is 9.12015666526787 percent
0.09375 <= linestring length < 0.125 is 5.285454509122155 percent
0.125 <= linestring length < 0.15625 is 3.6388705713086744 percent
0.15625 <= linestring length < 0.1875 is 2.5937693583518175 percent
0.1875 <= linestring length < 0.21875 is 2.036248826009632 percent
0.21875 <= linestring length < 0.25 is 1.5606578342625343 percent
0.25 <= linestring length < 0.28125 is 1.3768159382930678 percent
0.28125 <= linestring length < 0.3125 is 1.0490977759127151 percent

The bins here are 0.5 * 1/16 wide. That means that the linestrings in the first bracket will snap to the same point on the grid for start and end node.

msbarry commented 3 weeks ago

Cool, I think we probably want to benchmark a few "average" cases (in terms of num lines/num points) and also a worst-case to make sure it doesn't hang. Feel free to update the benchmark test data generation to be more realistic!

wipfli commented 3 weeks ago

That is the Swiss highway=primary network. It has 50k+ input linestrings. image Should I just feed those to the benchmarks?

wipfli commented 3 weeks ago

Thanks for adding the benchmarks @msbarry! The numbers on my machine are currently: image

msbarry commented 3 weeks ago

Some stats when I run over the planet using openmaptiles profile:

so for worst case we might want a case that's on the order of 500k line segments each with 2 points, and a case that's 200 lines with 5000 points each. For average case, maybe 50x2 10x10 and 2x50?

wipfli commented 3 weeks ago

These number sound perfect. Thanks for investigating

msbarry commented 3 weeks ago

Oh actually I'll try to grab those actual largest input geometries so we can just test against the real thing.

msbarry commented 3 weeks ago

OK I captured the largest real inputs to the line merge routine and put them in planetiler-core/src/test/resources. I also added a unit test that runs them (although they take a while right now)

When I run the benchmarks on my machine now I see:

image

Another thing I remembered is that we need to make sure the output is deterministic for a given input to make sure planetiler produces byt-for-byte equivalent output archives when the input doesn't change.

wipfli commented 3 weeks ago

I am iterating over HashMap<Point, Node> .values() in some places. Is the order of this deterministic?

msbarry commented 3 weeks ago

I am iterating over HashMap<Point, Node> .values() in some places. Is the order of this deterministic?

It is not, that order will be different each time. One thing I was wondering is since the majority of lines probably don't intersect, the step of breaking lines up into segments and reassembling them doesn't change the result much. What if we did a first pass that just snap-rounded points and used a hashmap to split segments only at points shared with other segments? The result could go into a structure with pointers (Node{edgesLeaving: Edge[]} and Edge{from: Node, to: Node, points: Coordinate[]} that can be processed without needing to query a hashmap?

msbarry commented 3 weeks ago

Also can you post the results of the new benchmark on your machine so we have a baseline for comparison?

wipfli commented 3 weeks ago

image

wipfli commented 3 weeks ago

Like roughly half as fast as your computer.

wipfli commented 3 weeks ago

The optimization you propose sound good. Another though is that in the Swiss highway=primary data at z6, 52 percent of all ways were less than 5 nodes long and 66 percent were shorter than 0.0625 pixel which is the grid resolution. This probably means that we have a lot of nodes which snap to the same grid point on the same linestring. For example, the coordinates of a typical linestring could be:

[
  0 0, 
  0 0, 
  0 0, 
  0.0625 0.0625,
 ...
]

Maybe we could have a data structure for Edge that removes consecutive coordinates which are identical?

msbarry commented 3 weeks ago

I prototyped the "split lines at intersections then use a graph data structure" approach I outlined here: https://github.com/wipfli/planetiler/pull/17 - I think it's doing the same thing but let me know what you think! It appears to be faster than JTS now, except in the case where it's merging very few long lines since it needs to do the extra point-duplication check.

msbarry commented 3 weeks ago

The optimization you propose sound good. Another though is that in the Swiss highway=primary data at z6, 52 percent of all ways were less than 5 nodes long and 66 percent were shorter than 0.0625 pixel which is the grid resolution. This probably means that we have a lot of nodes which snap to the same grid point on the same linestring. For example, the coordinates of a typical linestring could be:

[
  0 0, 
  0 0, 
  0 0, 
  0.0625 0.0625,
 ...
]

Maybe we could have a data structure for Edge that removes consecutive coordinates which are identical?

For arbitrary input, that makes sense, but when this normally runs in planetiler post-processing all of the input linestrings have already been snap-rounded and encoded as vector tile geometries during initial linestring processing, so the snap-rounding will usually have no impact. There is an option to encode those intermediate geometries at higher resolution though, so it might be a good idea to keep this in anyway.

wipfli commented 3 weeks ago

The pointer version is so much faster, thanks a lot!

image

wipfli commented 3 weeks ago

The loop merging on the Monaco example works still the same: image

wipfli commented 3 weeks ago

Regarding the snap rounding: when I run this code

for (var item : items) {
    System.out.println(item.geometry().decode());
}

In postProcessLayerFeatures, i.e., the location where I would usually call FeatureMerge.mergeLineStrings, I get the following output:

LINESTRING (133.27390670776367 93.35183334350586, ...
LINESTRING (133.28918075561523 93.33192443847656, ...
LINESTRING (133.27710723876953 93.34591293334961, ...
LINESTRING (133.28014755249023 93.34562301635742, ...
LINESTRING (133.27346801757812 93.35217666625977, ...
LINESTRING (133.27404403686523 93.35100936889648, ...
LINESTRING (133.28104400634766 93.3455810546875, ...
LINESTRING (133.28335571289062 93.34036254882812, ...
LINESTRING (133.2816390991211 93.34541320800781, ...
LINESTRING (133.28211212158203 93.34516525268555, ...
LINESTRING (133.2835464477539 93.33916091918945, ...
LINESTRING (133.2709846496582 93.35429763793945, ...
LINESTRING (133.28495407104492 93.33670425415039, ...
LINESTRING (133.28466415405273 93.33694076538086, ...

so it looks at first glance like the precision is much more than scale=16. Am I missing something?

These numbers are from highway=primary in Monaco...

msbarry commented 2 weeks ago

so it looks at first glance like the precision is much more than scale=16. Am I missing something?

These numbers are from highway=primary in Monaco...

I think that's this logic here: https://github.com/onthegomap/planetiler/blob/dc17a1c6275eb0e7fb27e5f879f0ffd184f34b38/planetiler-core/src/main/java/com/onthegomap/planetiler/render/FeatureRenderer.java#L262-L270

It increases the precision for linestrings to help with the JTS line merging because it gets confused with any node that has more than 2 lines coming in and out.

msbarry commented 2 weeks ago

@wipfli I'm looking at the transportation_name layer in openmaptiles. It uses a large min length (20+) in order to avoid lines too short to put a label on, and with the JTS line merger the output is looking like this:

image

which appears to have longer segments/more opportunities for label placement compared to loop line merger output that looks like this:

image

Also findAllPaths hangs forever on very large road networks so I'm not sure that setting loop min length the same as min length makes sense in these cases? What we want is to optimize for the longest possible connected segments. What do you think here?

wipfli commented 2 weeks ago

I am keeping track of the recursion depth now and abort loop merging if inside findAllPaths the recursion gets too deep. That allows us to finish to 20 benchmark:

image

wipfli commented 2 weeks ago

The connectivity in my highway=primary and secondary example of Switzerland at zoom 6 looks now better. Can you check if it has also improved for transportation_name?

msbarry commented 2 weeks ago

Looks like this one is showing even fewer transportation_name edges compared to JTS:

image image

If I change it to use setPrecisionModel(new PrecisionModel(PrecisionModel.FLOATING)) then the output looks almost identical to JTS line merging, and the unit test number of merged linestrings in the output goes down from 20k+ to 2-3k 🤔

I uploaded a single long linestring for a road in Massachusetts I would expect to collapse down to 1 or 2 linestrings in the output (i90.wkb.gz) and added a test for it. Currently it's merging to 60-90+ segments since a lot of the nodes have degree 3 or 4 which prevents them from merging, and removing loops doesn't seem to help. Not sure exactly what's going on

wipfli commented 2 weeks ago

There must be still some differences in the old and the new implementation of removeLoops.

Here is an example of highway=motorway and motorway_link in one layer at z6 in Switzerland https://www.openstreetmap.org/#map=16/47.31099/7.80455

with minLength=0 and loopMinLength=0 it looks like this:

image

If we turn on loop merging with loopMinLength=8*0.0625 and use the old code from https://github.com/wipfli/linestring-merging-planetiler we get this which looks good:

image

But when we use the current version from this pull request also with loopMinLength=8*0.0625 we get this which looks like too many segments were removed:

image

msbarry commented 2 weeks ago

Got it, let's try to add a minimal test case to reproduce each of these issues so we don't regress after fixing something. I found/fixed one issue in merge method when the merged segment forms a loop - it was previously adding both the forward and reverse edge which I think was causing some issues.

I dug into my line segment issue a bit more, it looks like my i90.wkb.gz test file is a road with 2 lanes going in opposite directions. When we snap the points, it creates a lot of intersections with loops between them. When loop min length < min length those loops end up creating a lot of segments shorter than min length, so they get dropped in the output.

I could see 2 possible paths forward:

image

I added a test testMergeCarriagewaysWithOneSplitLongerThanLoopMinLength that shows how this would work but it's currently failing

What do you think?

msbarry commented 2 weeks ago

For the loop removal if the goal is just to remove the first edge of all paths that go from A to B except the shortest, I wonder if there's a simpler way to approach it that doesn't require keeping track of all the paths... something like:

it seems like "find shortest distance from a to b" is a simpler problem than finding all paths from a to b

wipfli commented 2 weeks ago

For the loop removal if the goal is just to remove the first edge of all paths that go from A to B except the shortest, I wonder if there's a simpler way to approach it that doesn't require keeping track of all the paths... something like:

  • for each edge leaving A, find the shortest distance from edge.to to B up to loopMinLength
  • if there are multiple edges that can reach B in < loopMinLength then remove all but the shortest

it seems like "find shortest distance from a to b" is a simpler problem than finding all paths from a to b

It is a subtle difference but you are right, with this we would not need to find all paths but only the shortest which is nice.

wipfli commented 2 weeks ago

I added a test for some data that I found in Harkingen where a motorway did not get merged correctly.

Before loop removal it looks like this:

Screenshot_20241108_224556

And after like this:

Screenshot_20241108_224630-1

It might be the same as one of your tests.

wipfli commented 2 weeks ago

It looks like the current way of cutting loops open is actually wrong and we have to do it differently, for example like you proposed @msbarry.

Here is an example which currently fails:

image

Say we want to go from 0 to 2. We find the following paths:

0 -> 3 -> 2 is the shortest path, for the other the we remove the first edge of the path, i.e., we remove 0 -> 2 and 0 -> 3. And now we have nicely disconnected 0 from 2...

msbarry commented 2 weeks ago

Yeah I was thinking this might be happening... I started on a shortest distance-based approach

msbarry commented 2 weeks ago

I took a stab at an A*-shortest-distance-based loop breaker - see breakLoops and shortestDistance - looks like your new test still passes - how does it look to you?

wipfli commented 2 weeks ago

Thanks for implementing A*. I looks like it works on my test road networks. Here are the benchmark results from my laptop. Nice that the loop(20) finishes now!

image

The code looks good to me overall. What I am not quite sure is the heuristics part. We are sure that this will always find the shortest path, right?

wipfli commented 2 weeks ago

How should we handle these sort of junctions:

Screenshot_20241109_233959

This is after loop removal but before min length line dropping. After min length the connecting segment is gone:

Screenshot_20241109_233951

When we do simplification, the separation between the two connected lines will increase further.

It feels like when we have situations like "short edge" connects to 2 "long edges" on both ends we might want to not drop the short edge. Or actually we can remove the short edge, but keep the connecting lines unmerged such that we have endpoints which will keep the lines there after douglas peuker. What do you think?

wipfli commented 2 weeks ago

By the way another nice test case is I95 south of Washington. It has 3 carriage ways that sometimes are quite far separated... https://www.openstreetmap.org/#map=15/38.43446/-77.40842

msbarry commented 2 weeks ago

The code looks good to me overall. What I am not quite sure is the heuristics part. We are sure that this will always find the shortest path, right?

A* uses the heuristic to do a "best-first" search from A to B instead of breadth-first. The "best" path is determined using the current path length + the heuristic (straight line distance from the end of the path to B). As long as the heuristic is the minimum possible cost from the end node to B, it's still guaranteed to produce the shortest path. There might be a better name to give it in the code though...

How should we handle these sort of junctions:

I think that's very similar to the same case that I'm concerned about trying to make transportation_name segments as long as possible (and the failing test I added testMergeCarriagewaysWithOneSplitLongerThanLoopMinLength) before min-length dropping.

Instead of splitting segments at 3+ way intersections, we need to pick 2 segments to connect to make a longer one. There might be some heuristic we could use when looking at each intersection, but another idea I was thinking about was to find the "longest shortest path between 2 points" through the graph (I think this is called the "diameter" of the graph), emit that as a linestring, removing it from each of the nodes, then repeat until the remaining lines are shorter than min length. I'm not sure if there's an efficient way to do that though...

wipfli commented 2 weeks ago

Currently we do this:

With this approach you are right that if minLength > loopMinLength, some loops will be dropped in the "remove all short edges" step. I see this as expected behavior and would recommend to set minLength < loopMinLength.

Regarding your proposal to decide at 3-way junctions which two edges to merge I think that is a great idea overall because it leads to longer connected linestrings. But what I can see as a problem is that line simplification will lead to a visual gap. Something like this:

image

Maybe a simple solution to this would be to flip the order, i.e. simply -> merge 3-way junctions for longest possible strings. What do you think?

wipfli commented 2 weeks ago

Here is a nice crazy 3-junction group of highway=primary roughly at https://www.openstreetmap.org/#map=12/47.5090/8.7469

image

Another approach could be to merge pairs of lines at 3-way+ junctions that have a mutual angle closest to 180 degrees. I.e., lines that might be the logical continuation of each other. This would help with label placement in MapLibre because there you want flat angles too... And it would be easy to implement since it happens all locally at the nodes.

wipfli commented 2 weeks ago

I made a separate pull request for angle merging and progressive stub removal in https://github.com/wipfli/planetiler/pull/18. Let me know if you think these are good directions...

msbarry commented 2 weeks ago

The angle heuristic sounds like a promising way to break up 3+ way intersections, will be curious to test it on transportation_name layer.

Re: simplification we could simplify individual edge geometries without touching the endpoints as a part of line merging process. That would avoid breaking up connections. The only problem is that because we group by attributes first then merge lines with the same attribute so that could break connections that should exist between lines with different tags...