pysal / momepy

Urban Morphology Measuring Toolkit
https://docs.momepy.org
BSD 3-Clause "New" or "Revised" License
496 stars 59 forks source link

Linking objects under the new API #558

Closed martinfleis closed 4 months ago

martinfleis commented 8 months ago

I felt a need to clarify handling of indices and links between various objects before getting deeper to the refactoring. I wrote this primarily for myself to get a clear idea on how it all should work but I suppose that we will eventually turn this into a documentation page once we switch the docs to functional API.

You may notice that it all heavily relies on the same logic we use in libpysal.graph.

Index is the king

When we need to capture relationship between two objects (e.g., GeoDataFrame and its Graph), the primary method is to rely on index. Unlike momepy 0.7, which heavily depends on columns with IDs mapping rows of one GeoDataFrame to the other, momepy 0.8 attempts to minimise use of such columns. Below is the overview of the logic used in various situations.

Geometry and Graph

This case is easy. Graph is mapped to geometry (either GeoSeries or GeoDataFrame) via index of the GeoPandas object.

geometry = gdf.set_index('my_preferred_index')
contiguity = graph.Graph.build_contiguity(geometry)
momepy.neighbor_distance(geometry, contiguity)

In this case, you shall ensure that the index of geometry does not change and, in some cases, that the order of rows is also preserved to ensure the mapping of values to sparse arrays is not mismatched.

Series and Graph

A subset of the case above is linking of a pandas.Series to the Graph. Such a situation assumes that the index of the Series is equal to the index of the original geometry from which the Graph was created.

# typically, the Series is taken directly from the DataFrame
geometry = gdf.set_index('my_preferred_index')
contiguity = graph.Graph.build_contiguity(geometry)
momepy.alignment(geometry["orientation"], contiguity)

Geometry and two Graphs

Another subset is when you need to link geometry to two Graphs. In that case, both Graphs need to be based on the same index.

geometry = gdf.set_index('my_preferred_index')
adjacency_graph = graph.Graph.build_contiguity(geometry)
neighborhood_graph = graph.Graph.build_distance_band(
    geometry, 
    threshold=400,
)
momepy.mean_interbuilding_distance(
    geometry, 
    adjacency_graph, 
    neighborhood_graph,
)

Geometry and Geometry

When linking two geometry arrays together, which can be for example a used to capture which building belongs to which street segment, or which building belongs to which block/enclosure, you cannot rely solely on indices as the two objects do not match. In this situation, momepy will use the index of one GeoSeries, typically the shorter one, and a Series (i.e. a column) in another.

street_edges = gdf.set_index('my_preferred_index')
buildings["street_edge_id"] = momepy.get_network_id(
    buildings, 
    street_edges,
)
momepy.street_alignment(
    buildings, 
    street_edges, 
    network_id="street_edge_id",
)