pysal / momepy

Urban Morphology Measuring Toolkit
https://docs.momepy.org
BSD 3-Clause "New" or "Revised" License
488 stars 59 forks source link

CellAlignment doesn't work when using enclosed tessellation #278

Open matthew-law opened 3 years ago

matthew-law commented 3 years ago

momepy.CellAlignment expects the left and right GeoDataFrames (containing buildings and tessellation cells) to have the same number of elements, ie each building corresponds to one cell (which is the case in morphological tessellation). This means that it sometimes doesn't work when using enclosed tessellation, when there are some tessellation cells with no corresponding buildings (ie areas enclosed by roads with no buildings on them).

Code example:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-78-f527b308bee5> in <module>
----> 1 blg['blg_CellAlignment'] = mm.CellAlignment(blg, tess, 'blg_Orientation', 'tess_Orientation', 'uID', 'uID').series

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/momepy/distribution.py in __init__(self, left, right, left_orientations, right_orientations, left_unique_id, right_unique_id)
    419             right_orientations = right_orientations + "_y"
    420         self.series = np.absolute(comp[left_orientations] - comp[right_orientations])
--> 421         self.series.index = left.index
    422 
    423 

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   5476         try:
   5477             object.__getattribute__(self, name)
-> 5478             return object.__setattr__(self, name, value)
   5479         except AttributeError:
   5480             pass

pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    468         if not fastpath:
    469             # The ensure_index call above ensures we have an Index object
--> 470             self._mgr.set_axis(axis, labels)
    471 
    472     # ndarray compatibility

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    218 
    219         if new_len != old_len:
--> 220             raise ValueError(
    221                 f"Length mismatch: Expected axis has {old_len} elements, new "
    222                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 81708 elements, new values have 81479 elements

Where tess is an enclosed tessellation generated from the buildings blg.

martinfleis commented 3 years ago

Thanks for the report!

momepy.CellAlignment expects the left and right GeoDataFrames (containing buildings and tessellation cells) to have the same number of elements, ie each building corresponds to one cell (which is the case in morphological tessellation).

Actually, it doesn't. In the example below, there are more enclosed tessellation cells than buildings and it works fine.

import geopandas
import momepy

blg = geopandas.read_file(momepy.datasets.get_path('bubenec'), layer="buildings")
streets = geopandas.read_file(momepy.datasets.get_path('bubenec'), layer="streets")

enclosures = momepy.enclosures(streets, limit=geopandas.GeoSeries([streets.unary_union.convex_hull]))
tessellation = momepy.Tessellation(blg, "uID", enclosures=enclosures).tessellation

blg['orient_blg'] = momepy.Orientation(blg).series
tessellation['orient_tess'] = momepy.Orientation(tessellation).series

momepy.CellAlignment(blg, tessellation, 'orient_blg', 'orient_tess', 'uID', 'uID').series
0       0.290739
1       0.547136
2       5.486106
3       0.263380
4       0.165063
         ...    
139     0.185136
140    17.566047
141     0.386790
142     0.010713
143     0.438541
Length: 144, dtype: float64

print(blg.shape, tessellation.shape)
(144, 3) (153, 4)

So the issue is not in the non-matching number of elements.

I can reproduce the issue if I have duplicated unique IDs in the tessellation layer. That can happen when a building intersects two enclosures. It is generally not safe to use a building unique ID as a unique identifier of enclosures.

One way around this I was using is to combine both geometries (buildings and tessellation) into the same GeoDataFrame and then dynamically change the active geometry.

# renaming to keep track of which is which and merging together
combined = tessellation.rename_geometry("tessellation").merge(blg.rename_geometry("buildings"), on="uID", how="left")
combined["unique_id"] = range(len(combined))  # new truly unique ID

# switching active geometry from default tessellation
momepy.CellAlignment(combined.set_geometry("buildings"), combined, 'orient_blg', 'orient_tess', 'unique_id', 'unique_id').series

I am not sure if this is a bug or not though :D.

matthew-law commented 3 years ago

Thanks for having a look at this and hopefully getting to the bottom of the issue!

Would it be a bad idea to do a geometric overlay of streets (ie the edges of enclosures) and buildings as a preprocessing step? It should solve the issue of duplicate unique IDs, but it would mess up some of the metrics based on (eg) building area etc (by splitting buildings which overlap with streets). But I would assume that it is a relatively small number of buildings that cross a road (especially since my streets data only contains roads for cars, and not pedestrian-only streets), so hopefully this wouldn't have a significant effect on the overall results. I did try to find which buildings crossed which streets in my data, but crashed QGIS in the attempt so have yet to do so.

Alternatively, could there be some way to, for each building, get one of either a) the eID of the enclosure that the building is entirely within (this will be the case for most buildings), or b) the eID of the enclosure that the largest portion of the building is in, if it is split between two buildings? This could then be used in place of the uIDs that were previously used to match buildings and tessellation cells when running CellAlignment.

martinfleis commented 3 years ago

I don't think there's a universal solution for this situation. You can do overlay if you know that you're not going to use that building geometry for anything meaningful in the future. You can do what I've done above but that causes trouble in AreaRatio as you may have a larger building than its cell (in that case, an overlay is the best solution).

I'd say that you need to know your data and your use case and apply a solution that works the best for that.

a) the eID of the enclosure that the building is entirely within (this will be the case for most buildings)

that is geopandas.sjoin with op="within".

b) the eID of the enclosure that the largest portion of the building is in, if it is split between two buildings?

that is tobler's area_join

jGaboardi commented 1 year ago

Is this issue the same as #252?

martinfleis commented 1 year ago

@jGaboardi it seems so.