mattijn / topojson

Encode spatial data as topology in Python! 🌍 https://mattijn.github.io/topojson
BSD 3-Clause "New" or "Revised" License
179 stars 27 forks source link

Wrong topologies/arcs being created? #178

Closed theroggy closed 2 years ago

theroggy commented 2 years ago

I'm not sure, I might be doing or expecting something wrong, but it seems to me that the topologies (or the common lines) are not created correctly for the data I'm using.

This is a screenshot of the result I get when I visualize the input polygons + the lines created by topojson (in Topology.output["arcs"]).

image

Here is the file with the polygons as shown in the screenshot above + one with the topojson lines: testcase_topojson.zip

This is the script I ran:

from pathlib import Path
import sys
import geopandas as gpd
from shapely import geometry as sh_geom

# Add parent.parent dir to find dir
sys.path.insert(1, str(Path(__file__).resolve().parent.parent))
import topojson

dir = Path("C:/Temp")
input_path = dir / "2022-08-28_testcase_topojson.gpkg"

gdf = gpd.read_file(input_path)
topo = topojson.Topology(gdf, prequantize=False, shared_coords=True)
topolines = [sh_geom.LineString(coords) for coords in topo.output["arcs"]]
topolines_gdf = gpd.GeoDataFrame(
    geometry=topolines, crs=gdf.crs  # type: ignore
)
output_lines_path = dir / f"{input_path.stem}_topolines_master.gpkg"
topolines_gdf.to_file(output_lines_path)
mattijn commented 2 years ago

Thanks for raising the issue. Quick look. I'm trying to break it down to isolate the issue:

import geopandas as gpd
from shapely import geometry
import topojson as tp
def visz_arcs(topo):
    gdf = gpd.GeoDataFrame(geometry=[geometry.LineString(arc) for arc in topo.to_dict()['arcs']])
    gdf.plot(cmap='Dark2')
pth = '/Users/mattijnvanhoek/Downloads/testcase_topojson/testcase_topojson.gpkg'
gdf = gpd.read_file(pth)
# when `shared_coords` set to `True`, a path is considered shared when 
# all coordinates appear in both paths (`coords-connected`).
topo = tp.Topology(gdf, prequantize=False, shared_coords=True)
visz_arcs(topo)

output_3_0

# when `shared_coords` set to `False` a path is considered shared when
# coordinates are on the same path (`path-connected`). 
# the path-connected strategy is more 'correct', but slower. Default is `True`.
topo = tp.Topology(gdf, prequantize=False, shared_coords=False)
visz_arcs(topo)

output_4_0

With shared_coords=True it seems there is some overlap with some line strings. These linestrings are not splitted and recognised as duplicate arcs. This behaviour above is on master version. Not sure if the same happens in latest release.

It would be great if we can isolate it even further. Maybe it is possible to bring it back to 2 or 3 linestrings?

Currently the following gives to many output

from topojson.core.cut import Cut
cut = Cut(gdf, options={'prequantize':False, 'shared_coords':True})
cut.to_svg(separate=True, include_junctions=True)
image
from topojson.core.dedup import Dedup
dedup = Dedup(gdf, options={'prequantize':False, 'shared_coords':True})
dedup.to_svg(separate=True, include_junctions=True)
mattijn commented 2 years ago

With shared_coords=False it seems OK, but what about here? image

Hm, thinking about it. I think that is alright as well. A single point on a line is not a shared path and therefor not seen as a junction. Ah, the green and pink arcs are separated because probably there is the start/end coordinate of the polygon. With a perfect topology these two arcs are combined again.

theroggy commented 2 years ago

Probably you know this resource already, but I Just found this really interesting explanation that clarified for me at least what to expect: https://bost.ocks.org/mike/topology/

mattijn commented 2 years ago

Yes. I use the same wording/phases to create a certain synergy, but the implementations are different.

mattijn commented 2 years ago

See also https://mattijn.github.io/topojson/how-it-works.html

theroggy commented 2 years ago

Some answers:

I think the problem at least starts in the 'join' phase: there is an issue in how the junctions are determined, as there are junctions in the middle of lines. Possibly there are also missing junctions which could explain the overlapping parts.

theroggy commented 2 years ago

I have been looking deeper into it based on your feedback and it seems I misunderstood the impact of shared_coords=True. Both the overlapping pieces and the odd way the "red" line is split can be explained by this. Not sure if this behaviour is really wanted and/or "by design", but my data definitely needs shared_coords=True to get decent results.

As I stated before I also saw issues when I tried shared_coords=True but they might indeed be explained by the problem you raise here:

Ah, the green and pink arcs are separated because probably there is the start/end coordinate of the polygon. With a perfect topology these two arcs are combined again.

I'll have a look if I can add support to combine those arcs again...