Closed thomleysens closed 4 years ago
Hi @thomleysens,
UserWarning in your case during Tessellation says that 1 element(s) collapsed during generation
, which means that building with unique_id
57688 is so small, that during the inward offset it collapses into itself generating empty geometry.
The second UserWarning tells you, that tessellation contains MultiPolygons, not buildings. That is very often caused by overlapping building polygons or other kinds of imprecision. Which is quite common in OSM. The list of IDs matches building unique_id do you can check which buildings are causing it and edit them manually.
What are the initial objects ? The buildings within the input GeoDataFrame?
Yes, buildings. The most convenient way of checking it is saving gdf with unique_id and opening it in QGIS (or similar). You generally need to edit the geometry manually.
Can you tell me the error momepy.NeighborDistance
is raising?
Ideally if you can provide some reproducible example, so I can debug it, that would be great.
@thomleysens I think I know where the issue comes from in momepy.NeighborDistance
. Can you show how did you generate spatial_weights
?
Thanks for your explanations.
This is how I proceed to get buildings, clean them and write them Here is the input geojson file study_field.zip
.
import geopandas as gpd
import osmnx as ox
from shapely.geometry import Polygon, MultiPolygon
study_area = gpd.read_file("./data/study_field.geojson").to_crs(epsg=4326)
bldgs_osm = ox.footprints.footprints_from_polygon(
study_area["geometry"][0]
)
#Keep only a few columns
bldgs_osm = bldgs_osm[["name", "geometry"]]
#Delete non valid geometries
bldgs_osm["valid"] = bldgs_osm["geometry"].is_valid
bldgs_osm = bldgs_osm.loc[bldgs_osm["valid"] == True]
bldgs_osm["empty"] = bldgs_osm["geometry"].is_empty
bldgs_osm = bldgs_osm.loc[bldgs_osm["empty"] == False]
#Drop useless "valid" column
bldgs_osm.drop(["valid","empty"], inplace=True, axis=1)
bldgs_osm.to_file("./data/buildings_osm.gpkg", layer="buildings", driver="GPKG")
Then I run the preprocess
import momepy
bldgs_osm = gpd.read_file("./data/buildings_osm.gpkg", layer="buildings")
bldgs_osm_projected = ox.project_gdf(bldgs_osm) #seems to be needed to avoid errors in tessallation part (see doc)
bldgs = momepy.preprocess(
bldgs_osm_projected,
size=30,
compactness=True,
islands=True
)
Then the tessellation
bldgs['uID'] = momepy.unique_id(bldgs)
limit = momepy.buffered_limit(bldgs)
tessellation = momepy.Tessellation(
bldgs,
unique_id='uID',
limit=limit
).tessellation
And finally the spatial weights
sw1 = momepy.sw_high(k=1, gdf=tessellation, ids='uID')
bldgs['neighbour_dist'] = momepy.NeighborDistance(bldgs, sw1, 'uID').series
@martinfleis Feel free to ask for more explanations if this is not enough clear. Maybe I don't correctly understand a part of the process.
Thanks! It is all clear. Regarding tessellation, I was right. Warnings are caused by incorrect geometry (overlaps), see the image below.
But that should not affect momepy.NeighborDistance
. Your dataset is big, so it is not easy to quickly check the error. Can you paste the error thrown by NeighborDistance
? (btw, I just realised that in case of no neighbours, distance should be NaN rather than 0)
@thomleysens I think I figured it out. Because sw1 is generated from tessellation which does not match buildings (due to errors above), it happens that some buildings are not included in sw1 and that throws an error. Your try/except approach is probably good in this case and it is worth a fix. The same situation will happen for couple of other functions as well. But an actual error thrown would help anyway.
@thomleysens try this, I'll make a bug fix today or during the weekend if it solves the issue.
def __init__(self, gdf, spatial_weights, unique_id):
self.gdf = gdf
self.sw = spatial_weights
self.id = gdf[unique_id]
# define empty list for results
results_list = []
data = gdf.set_index(unique_id)
# iterating over rows one by one
for index, row in tqdm(data.iterrows(), total=data.shape[0]):
if index in spatial_weights.neighbors.keys():
neighbours = spatial_weights.neighbors[index]
building_neighbours = data.loc[neighbours]
if len(building_neighbours) > 0:
results_list.append(
np.mean(building_neighbours.geometry.distance(row["geometry"]))
)
else:
results_list.append(np.nan)
else:
results_list.append(np.nan)
self.series = pd.Series(results_list, index=gdf.index)
Sorry for the big dataset. I should have thought about checking the geometries with QGIS, sorry for that. Unfortunately I didn't keep the error message but it was a DataFrame key error and this error corresponds to your analysis "Because sw1 is generated from tessellation which does not match buildings (due to errors above), it happens that some buildings are not included in sw1 and that throws an error".
I'm gonna try your fix, thanks a lot. It could take about 20-30 minutes because my computer is pretty slow.
Hi,
I ran the NeighborDistance
with my fix then with your fix and I have compared the 2 buildings dataframes upated with ["neighbour_dist"].
They are identical.
But when I want to plot the buildings dataframe from your fix (with code below) ...
f, ax = plt.subplots(figsize=(15, 15))
bldgs.plot(
ax=ax,
column='neighbour_dist',
scheme='naturalbreaks',
k=5,
legend=True,
cmap='OrRd_r'
)
I had to drop the NaN values in bldgs["neighbour_dist"] to avoid the error below (don't bother me and it is a correct behavior but I think it's worth to be mentioned, especially for your great illustrated examples)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-3bcd232fce97> in <module>
10 k=5,
11 legend=True,
---> 12 cmap='OrRd_r'
13 )
14 ax.set_facecolor("black")
~/anaconda3/envs/space/lib/python3.7/site-packages/geopandas/geodataframe.py in plot(self, *args, **kwargs)
604 from there.
605 """
--> 606 return plot_dataframe(self, *args, **kwargs)
607
608 plot.__doc__ = plot_dataframe.__doc__
~/anaconda3/envs/space/lib/python3.7/site-packages/geopandas/plotting.py in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, classification_kwds, **style_kwds)
532 classification_kwds["k"] = k
533
--> 534 binning = _mapclassify_choro(values, scheme, **classification_kwds)
535 # set categorical to True for creating the legend
536 categorical = True
~/anaconda3/envs/space/lib/python3.7/site-packages/geopandas/plotting.py in _mapclassify_choro(values, scheme, **classification_kwds)
719 del classification_kwds["k"]
720 try:
--> 721 binning = scheme_class(values, **classification_kwds)
722 except TypeError:
723 raise TypeError("Invalid keyword argument for %r " % scheme)
~/anaconda3/envs/space/lib/python3.7/site-packages/mapclassify/classifiers.py in __init__(self, y, k, initial)
1652 self.k = k
1653 self.init = initial
-> 1654 MapClassifier.__init__(self, y)
1655 self.name = "NaturalBreaks"
1656
~/anaconda3/envs/space/lib/python3.7/site-packages/mapclassify/classifiers.py in __init__(self, y)
539 self.name = "Map Classifier"
540 self.y = y
--> 541 self._classify()
542 self._summary()
543
~/anaconda3/envs/space/lib/python3.7/site-packages/mapclassify/classifiers.py in _classify(self)
550
551 def _classify(self):
--> 552 self._set_bins()
553 self.yb, self.counts = bin1d(self.y, self.bins)
554
~/anaconda3/envs/space/lib/python3.7/site-packages/mapclassify/classifiers.py in _set_bins(self)
1673 self.k = k
1674 else:
-> 1675 res0 = natural_breaks(x, k, init=self.init)
1676 fit = res0[2]
1677 self.bins = np.array(res0[-1])
~/anaconda3/envs/space/lib/python3.7/site-packages/mapclassify/classifiers.py in natural_breaks(values, k, init)
399 Warn("Warning: setting k to %d" % uvk, UserWarning)
400 k = uvk
--> 401 kres = _kmeans(values, k, n_init=init)
402 sids = kres[-1] # centroids
403 fit = kres[-2]
~/anaconda3/envs/space/lib/python3.7/site-packages/mapclassify/classifiers.py in _kmeans(y, k, n_init)
350 y = y * 1.0 # KMEANS needs float or double dtype
351 y.shape = (-1, 1)
--> 352 result = KMEANS(n_clusters=k, init="k-means++", n_init=n_init).fit(y)
353 class_ids = result.labels_
354 centroids = result.cluster_centers_
~/anaconda3/envs/space/lib/python3.7/site-packages/sklearn/cluster/_k_means.py in fit(self, X, y, sample_weight)
856 order = "C" if self.copy_x else None
857 X = check_array(X, accept_sparse='csr', dtype=[np.float64, np.float32],
--> 858 order=order, copy=self.copy_x)
859 # verify that the number of samples given is larger than k
860 if _num_samples(X) < self.n_clusters:
~/anaconda3/envs/space/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
560 if force_all_finite:
561 _assert_all_finite(array,
--> 562 allow_nan=force_all_finite == 'allow-nan')
563
564 if ensure_min_samples > 0:
~/anaconda3/envs/space/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
58 msg_err.format
59 (type_err,
---> 60 msg_dtype if msg_dtype is not None else X.dtype)
61 )
62 # for object dtype data, we only check for NaNs (GH-13254)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Should I close this issue ?
Great! Thanks for the traceback. This is known issue to be resolved in geopandas/mapclassify, we'll take care of it.
This issue will be closed by #131 automatically.
Thanks again!
Ok, great !
Thanks a lot for your quick fix and your explanations. And again, thanks for this great library.
Hi,
Thanks for your great library and its illustrated documentation.
I have this UserWarnings when I run
momepy.Tessellation
on GeoDataFrame with buildings from OpenStreetMap:I understand what's wrong but do not understand what I must change on initial objects. What are the initial objects ? The buildings within the input GeoDataFrame ? Because it don't seem to contain MultiPolygons when I check the instance of the geometries.
For now, I found a workaround by adding a
try except
in the NeighbourDistance in momepy.distribution in order to getmomepy.NeighborDistance
working.Do you have any idea on what I could change/modify to get NeighbordDistance working without this workaround ?