pysal / momepy

Urban Morphology Measuring Toolkit
https://docs.momepy.org
BSD 3-Clause "New" or "Revised" License
496 stars 59 forks source link

remove_false_nodes throws an unexpected error #657

Closed ale-v closed 1 month ago

ale-v commented 1 month ago

Describe the problem

When trying to remove false nodes from a street network obtained from Ordnance Survey I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[56], line 1
----> 1 streets_BR = mm.remove_false_nodes(streets_BR)
      2 len(streets_BR)

File [~\AppData\Local\miniconda3\envs\ox\Lib\site-packages\momepy\preprocessing.py:260](http://localhost:8889/lab/tree/OneDrive%20-%20University%20of%20Strathclyde/FB/Birmingham/_scripts/~/AppData/Local/miniconda3/envs/ox/Lib/site-packages/momepy/preprocessing.py#line=259), in remove_false_nodes(gdf)
    256 node_coords = shapely.get_coordinates(target_nodes)
    257 coords = np.array(loop_geom.coords)
    258 new_start = (
    259     node_coords[0]
--> 260     if (node_coords[0] != coords[0]).all()
    261     else node_coords[1]
    262 )
    263 new_start_idx = np.where(coords == new_start)[0][0]
    264 rolled_coords = np.roll(coords[:-1], -new_start_idx, axis=0)

ValueError: operands could not be broadcast together with shapes (2,) (3,)

Any solution?

Steps to reproduce

import geopandas as gpd
import momepy as mm
import pandas as pd

sk = gpd.read_file('data/OS Open Roads/SK_RoadLink.shp')
so = gpd.read_file('data/OS Open Roads/SO_RoadLink.shp')
sp = gpd.read_file('data/OS Open Roads/SP_RoadLink.shp')

streets = pd.concat([sk, so, sp], ignore_index=True)

streets = streets.reset_index(drop=True).drop_duplicates('identifier')

bound_BR = gpd.read_file('data/boundary.shp')

streets_BR = streets[streets.within(bound_BR)]

streets_BR = mm.remove_false_nodes(streets_BR)

Versions of your packages

momepy = 0.8.0 geopandas = 1.0.1 libpysal = 4.12.1 networkx = 3.3

Your operating system

Windows 10

Additional context

No response

jGaboardi commented 1 month ago

This seems to be happening in momepy actually so I'll transfer this issue over to there.

I have no idea why I thought this was opened in tobler... LOL

jGaboardi commented 1 month ago

Seems I don't have the proper permissions for the transfer.

Couldn't transfer because it's already here...

martinfleis commented 1 month ago

Can you try using force_2d prior using the function?

ale-v commented 1 month ago

Thanks. It solved the issue!

martinfleis commented 1 month ago

You probably had mixed LineString and LineString Z geometries after the concatenation.

I am curious if we should do some check internally here to avoid these cryptic errors. Just to provide an informative error rather than forcing 2 or 3D. @jGaboardi thoughts?

jGaboardi commented 1 month ago

Yes, more checks & informative messages are always great ideas.

@ale-v -- Any chance you can tease out the subset of offending data? We can use that in a test.

martinfleis commented 1 month ago

That would be a pain with licensing. We can just call force_3d on half of the data to reproduce this.

jGaboardi commented 1 month ago

Boiled down example this is not reproducible:

Python 3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:57:01) [Clang 17.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import geopandas, momepy, numpy, shapely
   ...: 
   ...: line_1 = shapely.LineString((shapely.Point(1, 1), shapely.Point(2, 2)))
   ...: line_2 = shapely.LineString((shapely.Point(2, 2), shapely.Point(3, 3)))
   ...: 
   ...: line_2d_input = numpy.array([line_1, line_2])
   ...: line_3d_input = numpy.array([line_1, shapely.force_3d(line_2)])
   ...: 
   ...: gdf_2d_input = geopandas.GeoDataFrame(geometry=line_2d_input)
   ...: gdf_3d_input = geopandas.GeoDataFrame(geometry=line_3d_input)
   ...: 
   ...: 
   ...: (
   ...:     line_1,
   ...:     line_2,
   ...:     line_2d_input,
   ...:     line_3d_input,
   ...:     momepy.remove_false_nodes(line_2d_input),
   ...:     momepy.remove_false_nodes(line_3d_input),
   ...:     momepy.remove_false_nodes(gdf_2d_input),
   ...:     momepy.remove_false_nodes(gdf_3d_input),
   ...: )

Out[1]: 

(<LINESTRING (1 1, 2 2)>,
 <LINESTRING (2 2, 3 3)>,
 array([<LINESTRING (1 1, 2 2)>, <LINESTRING (2 2, 3 3)>], dtype=object),
 array([<LINESTRING (1 1, 2 2)>, <LINESTRING Z (2 2 0, 3 3 0)>],
       dtype=object),
 0    LINESTRING (1 1, 2 2, 3 3)
 dtype: geometry,
 0    LINESTRING (1 1, 2 2, 3 3)
 dtype: geometry,
                      geometry
 0  LINESTRING (1 1, 2 2, 3 3),
                      geometry
 0  LINESTRING (1 1, 2 2, 3 3))
jGaboardi commented 1 month ago

Ideas about what to do different here to reproduce?

martinfleis commented 1 month ago

It is in the code handling and rotating loops, so we need a loop that is formed by 3D geometry. Because shapely.get_coordinates ignores Z by default, while geom.coords does not. So it needs a loop of some sort.

jGaboardi commented 1 month ago

Good call. The following reproduces the error:

In [1]: import momepy, numpy, shapely
   ...: 
   ...: line_1 = shapely.LineString((shapely.Point(1, 1), shapely.Point(2, 2)))
   ...: line_2 = shapely.LineString((shapely.Point(2, 2), shapely.Point(3, 3)))
   ...: line_3 = shapely.LineString((shapely.Point(2, 2), shapely.Point(1, 2)))
   ...: line_4 = shapely.LineString((shapely.Point(1, 2), shapely.Point(1, 3)))
   ...: line_5 = shapely.LineString((shapely.Point(1, 3), shapely.Point(2, 2)))
   ...: 
   ...: line_2d_input = numpy.array([line_1, line_2, line_3, line_4, line_5])
   ...: momepy.remove_false_nodes(line_2d_input)
   ...: 
   ...: line_3d_input = numpy.array([line_1, line_2, shapely.force_3d(line_3), line_4, line_5])
   ...: momepy.remove_false_nodes(line_3d_input)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 14
     11 momepy.remove_false_nodes(line_2d_input)
     13 line_3d_input = numpy.array([line_1, line_2, shapely.force_3d(line_3), line_4, line_5])
---> 14 momepy.remove_false_nodes(line_3d_input)

File ~/github_repos/momepy/momepy/preprocessing.py:260, in remove_false_nodes(gdf)
    256 node_coords = shapely.get_coordinates(target_nodes)
    257 coords = np.array(loop_geom.coords)
    258 new_start = (
    259     node_coords[0]
--> 260     if (node_coords[0] != coords[0]).all()
    261     else node_coords[1]
    262 )
    263 new_start_idx = np.where(coords == new_start)[0][0]
    264 rolled_coords = np.roll(coords[:-1], -new_start_idx, axis=0)

ValueError: operands could not be broadcast together with shapes (2,) (3,) 
jGaboardi commented 1 month ago

You probably had mixed LineString and LineString Z geometries after the concatenation.

I am curious if we should do some check internally here to avoid these cryptic errors. Just to provide an informative error rather than forcing 2 or 3D. @jGaboardi thoughts?

Pushing up a proposed fix shortly.