mattijn / topojson

Encode spatial data as topology in Python! 🌍 https://mattijn.github.io/topojson
BSD 3-Clause "New" or "Revised" License
178 stars 27 forks source link

Fix: duplicates with collinear points sometimes not found with shared_coords=False #211

Closed theroggy closed 11 months ago

theroggy commented 11 months ago

In the following case 2 linestrings were not correctly identified as duplicate with with shared_coords=False:

Practical case where this showed up. The combination of the above ingredients triggered the bug like this:

theroggy commented 11 months ago

Reference https://github.com/orthoseg/orthoseg/issues/134

mattijn commented 11 months ago

I stared at this for a while, I ran a debugger and I tried to understand why the fix in your PR solves the issue, you basically go from

lines_split.append(remove_collinear_points([np.array(linestring.coords)]))

to

lines_split.append([remove_collinear_points(np.array(linestring.coords))])

While I cannot really explain why this results in another linestring being detected as a duplicate; it resolves the issue that you have and the PR includes a nice test. So thank you!

theroggy commented 11 months ago

Yes, it has taken me quite some hours going from the real-world problem to that specific code change, and the only thing to show for it are 2 square brackets being moved a few characters :-).

The difference is in what remove_collinear_points receives as input:

One of the first lines of remove_collinear_points is len(line). For the first array will always be 1... so there are never collinear points to be removed :-(....

    # If only 2 points, no use checking
    if len(line) <= 2:
        return line
mattijn commented 11 months ago

Thanks for the explanation! I just released a new version of topojson, see https://pypi.org/project/topojson/1.7/

theroggy commented 11 months ago

By reading the change notes I noticed that the title of the PR was quite off... so I updated it.