missinglink / pbf

utilities for parsing OpenStreetMap PBF files and extracting geographic data
MIT License
21 stars 9 forks source link

xroads: improved deduplication #25

Closed missinglink closed 3 years ago

missinglink commented 3 years ago

since https://github.com/missinglink/pbf/pull/21/commits/c780d84d96b5f87041988758b1aadce219550ce8 from PR https://github.com/missinglink/pbf/pull/21 we are removing duplicate intersections of the same streets.

when running the command against a large geographic area this deduplication will likely exclude legitimate duplicate street name pairs in different cities/regions.

ie. it's meant to prevent producing two rows for the same two streets intersecting multiple times, but the error is that it will only produce one row for any two name pairs.

the fix is fairly simple, just include a spatial hash in the reference which gets stored in the seen map. The package github.com/mmcloughlin/geohash is already being included which can generate a hash, and also generate hashes for neighbours.

example: If there is a "Corner of Main St and Side Ave" in both New York and San Francisco then the code should produce two results, one per city, currently I believe it only outputs the first one it finds.