move the seen map to a global scope so we dedupe the whole extract, prior to this we were only deduping grouped on node_id. this actually had the effect of not triggering the issue mentioned in https://github.com/missinglink/pbf/issues/25 but was still producing many duplicates because there can be multiple nodes at the same intersection with differing ids which share ways with the same name
add a spatial hashing algorithm which ensures that disparate and distant intersections are not considered equal
as a result of the first change the NY extract went from 402374 lines to 362257 lines, then after the second change it increased to 372407 lines (which is inline with my expectations).
this PR implements the solution discussed in https://github.com/missinglink/pbf/issues/25
there's two major changes here:
seen
map to a global scope so we dedupe the whole extract, prior to this we were only deduping grouped onnode_id
. this actually had the effect of not triggering the issue mentioned in https://github.com/missinglink/pbf/issues/25 but was still producing many duplicates because there can be multiple nodes at the same intersection with differing ids which share ways with the same namecloses https://github.com/missinglink/pbf/issues/25
as a result of the first change the NY extract went from 402374 lines to 362257 lines, then after the second change it increased to 372407 lines (which is inline with my expectations).