mattijn / topojson

Encode spatial data as topology in Python! 🌍 https://mattijn.github.io/topojson
BSD 3-Clause "New" or "Revised" License
183 stars 27 forks source link

WIP based on #179 #184

Closed mattijn closed 2 years ago

mattijn commented 2 years ago

Trying to emulate #179 using shapely and numpy. Could bring the timings to 33s, but did not yet got 18s as in #179. CC @theroggy

mattijn commented 2 years ago

Removed relate pattern, bring back timings to 12s. Progress 🥳!

Line by line profile results (time unit is in %).

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def juncs(data):
     2                                               # return tree as well from `select_unique_combs`
     3         1     213199.0 213199.0      9.1      idx_combs, tree = select_unique_combs(data["linestrings"])
     4                                               # collect geoms from tree (this are views or copies?)
     5         2     124958.0  62479.0      5.3      geom_combs = np.asarray([
     6                                                   [tree.geometries.take(idx[0]), tree.geometries.take(idx[1])] 
     7         1          1.0      1.0      0.0          for idx in idx_combs])
     8                                           
     9                                               # find intersection between linestrings 
    10         1    1267047.0 1267047.0    54.1      segments = intersection_all(geom_combs, axis=1)
    11         1     596985.0 596985.0     25.5      merged_segments = explode(shapely.line_merge(segments))
    12                                           
    13                                               # the start and end points of the merged_segments are the junctions
    14         1      11006.0  11006.0      0.5      coords, index_group_coords = shapely.get_coordinates(merged_segments, return_index=True)
    15         1       1088.0   1088.0      0.0      _, idx_start_segment = np.unique(index_group_coords, return_index=True)
    16         1         49.0     49.0      0.0      idx_start_end = np.append(idx_start_segment, idx_start_segment - 1)
    17                                           
    18         1        265.0    265.0      0.0      junctions = coords[idx_start_end]
    19                                               # junctions can appear multiple times in multiple segments, remove duplicates
    20         1       3587.0   3587.0      0.2      _, idx_uniq_junction = np.unique(asvoid(junctions), return_index=True)
    21         1     124403.0 124403.0      5.3      _junctions = list(map(geometry.Point, junctions[idx_uniq_junction]))
    22         1        139.0    139.0      0.0      print(len(_junctions))
    23         1          1.0      1.0      0.0      return _junctions
theroggy commented 2 years ago

I think this test is still missing, should help to get more tests passing:

            # if the original linestrings that intersect are equal, no junctions
            linestrings_inters_gdf = linestrings_inters_gdf[~pygeos.equals(
                linestrings_gdf.geometry.iloc[linestrings_inters_gdf.index_1].array.data,
                linestrings_gdf.geometry.iloc[linestrings_inters_gdf.index_2].array.data,
            )]
mattijn commented 2 years ago

You should deserve the merging credits. So I would categorize the work here more as prototyping and might serve as a reference.

theroggy commented 2 years ago

You should deserve the merging credits. So I would categorize the work here more as prototyping and might serve as a reference.

That's nice, appreciated!

I'll integrate this version in #179 .

mattijn commented 2 years ago

STRTree can .query() and .take() multiple simultaneous, game changer! Also a bit more shapely native functions. Now 88.5% of the time is spend on segments = intersection_all(combs_no_dups, axis=1). No real bottlenecks in the remainder of this part I think.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def juncs(data):
     2                                               # return tree as well from `select_unique_combs`
     3         1      32998.0  32998.0      2.4      idx_combs, tree = select_unique_combs(data["linestrings"])
     4                                               # collect geoms from tree
     5         1        510.0    510.0      0.0      geom_combs = tree.geometries.take(idx_combs)
     6         1      29366.0  29366.0      2.2      combs_no_dups = geom_combs[~shapely.equals(geom_combs[:,0], geom_combs[:,1])]
     7                                           
     8                                               # find intersection between linestrings 
     9         1    1194016.0 1194016.0     88.5      segments = intersection_all(combs_no_dups, axis=1)
    10         1      80758.0  80758.0      6.0      merged_segments = shapely.line_merge(segments)
    11         1       2900.0   2900.0      0.2      linestring_segments = shapely.get_parts(merged_segments)
    12                                           
    13                                               # the start and end points of the merged_segments are the junctions
    14         1       1704.0   1704.0      0.1      coords, index_group_coords = shapely.get_coordinates(linestring_segments, return_index=True)
    15         1        970.0    970.0      0.1      _, idx_start_segment = np.unique(index_group_coords, return_index=True)
    16         1         47.0     47.0      0.0      idx_start_end = np.append(idx_start_segment, idx_start_segment - 1)
    17                                           
    18         1        233.0    233.0      0.0      junctions = coords[idx_start_end]
    19                                               # junctions can appear multiple times in multiple segments, remove duplicates
    20         1       3279.0   3279.0      0.2      _, idx_uniq_junction = np.unique(asvoid(junctions), return_index=True)
    21         1       2036.0   2036.0      0.2      _junctions = shapely.points(junctions[idx_uniq_junction])
    22         1          2.0      2.0      0.0      if not len(_junctions):
    23                                                   _junctions = []
    24         1        122.0    122.0      0.0      print(len(_junctions))
    25         1          1.0      1.0      0.0      return _junctions