Closed emuise closed 3 years ago
Hello @emuise, this is actually a great idea, thanks for bringing it up. It should actually be quite easty to implement with rasterio and geopandas, so I will try to do it over the weekend and release a new version including this. I will keep you posted.
Best, Martí
Hello @emuise,
I am very sorry for my delay, but these last PhD days are crazy and also I wanted to make some other changes before releasing the new minor. You can now (with PyLandStats 2.2.0) use a shapefile, geodataframe, geoseries or any file compatible with geopandas in your ZonalAnalysis
by means of the masks
argument (note that masks_arr
is now deprecated). The usage is quite straightforward, you can see an example in the 03-zonal-analysis.ipynb notebook, section "Using vector geometries to define zones" (cells 13 and 14).
I hope that this works for you. Feel free to reopen if it doesn't. Thanks again for your suggestion and for using PyLandStats. Best, Martí
Hi Martí, Thanks for doing this. I have ran it on some of my data and am encountering some issues.
I can't use FID for masks_index_col (this I can easily work around on my own, just want to let you know).
When I run .compute_class_metrics_df(metrics = ['proportion_of_landscape']), all of the values are 0 (see attached picture, this applies across all FID_Terr_B, and class_val).
Additionally, I receive this performance warning when running compute_class_metrics_df
Any ideas what could be causing any of this? Let me know if you need more info from me.
I have double checked that I'm not loading in blank rasters/shps.
Also, I just want to say I appreciate you adding this feature a lot. Best, Evan
Hello again @emuise,
the fact that the proportion of landscape is 0 is because there are no patches of that given class in any of your zones. You can pass fillna=False
to the compute_class_metrics_df
method, and see if the resulting cells are NaN
instead.
Your warning seems to be related to the index of the data frame (i.e., a multi index with the class_val
and FID_Terr_B
). Also, why can't you use 'FID' for masks_index_col
? what are the data types of each column of your data frame?
Would it be possible to send me your data so I can try to see what is going on? You can contact me by mail at "marti (dot) bosch (at) epfl (dot) ch" (no spaces, of course). There might be potential PyLandStats bugs so this could be a great opportunity to fix them. But I understand if you cannot share such data.
Best, Martí
Hi @martibosch, I sent you an email with a link to the data on google drive.
Using fillna = False leads to all values being NaN. Could this be something to do with the crs / them not overlapping somehow? The rasters are generated using the shapefiles, so this shouldn't be the case.
The data type for FID is Object ID, from Arc. I can just copy this into another column and use that, easy solve on my end.
Evan
Hello again @emuise,
This is going to be a long answer, which I will divide into two parts:
I wonder whether your NaN values are an issue of how the CRS is encoded in the raster and in the shapefile. Pyproj 2.6.1 reads the former as:
<Projected CRS: EPSG:3005>
Name: NAD_1983_BC_Environment_Albers
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: unnamed
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
and the latter as:
<Projected CRS: EPSG:3005>
Name: NAD83 / BC Albers
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Canada - British Columbia
- bounds: (-139.04, 48.25, -114.08, 60.01)
Coordinate Operation:
- name: British Columbia Albers
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
hence pyproj.crs.crs.CRS.is_exact_same
returns False
, although a simple equality comparison (i.e., __eq__
or ==
) returns True
. This may be the reason why pylandstats does not clip the raster to the shapefile zones properly. Does the following code (replace raster_filepath
and zones_filepath
with the path to your landscape raster and zone shapefile respectively) return a plot with a
pls.ZonalAnalysis(raster_filepath, masks=zones_filepath).landscapes[0].plot_landscape()
The following code works in my environment:
import pylandstats as pls
za = pls.ZonalAnalysis(raster_filepath, masks=zones_filepath)
pland_df = za.compute_class_metrics_df(['proportion_of_landscape'], [20])
and pland_df
is of the form:
metric proportion_of_landscape
class_val attribute_values
20 0 0.353737
1 0.000000
2 0.465496
3 0.538368
4 0.450421
... ...
299 0.000000
300 1.008403
301 14.307964
302 0.000000
303 2.040816
[304 rows x 1 columns]
Does this work for you?
Note however that ZonalAnalysis
ommits the zones that do not intersect with the raster (to avoid pointless computations), and since I have not provided a masks_index_col
, the above data frame cannot be directly mapped to your shapefile (note the 304 data frame rows vs. the 317 geometries in the shapefile) - this shouldn't be hard, but the whole point of masks_index_col
is to make it easier. Nonetheless, I did not understand which column of your shapefile should be used here. Do you have a column with a one-to-one mapping to the geometries? Otherwise, this might be problematic. Looking over your data, it seems that you might be interested in grouping the shapefile rows with the same FID and merging their polygon geometries as a single multipolygon geometry, so that at the end you have this one-to-one mapping between unique FID and unique geometry (even if it is a multipolygon geometry). If that is actually what you want, you can use geopandas as in:
import geopandas as gpd
import pylandstats as pls
gdf = gpd.read_file(zones_filepath)
gb_col = 'FID_Terr_B'
gb = gdf.groupby(gb_col)
group_gser = gpd.GeoSeries(gb.apply(lambda g: g['geometry'].unary_union),
index=gb.first().index,
crs=gdf.crs)
za = pls.ZonalAnalysis(raster_filepath, masks=group_gser)
pland_df = za.compute_class_metrics_df(['proportion_of_landscape'], [20])
and then you will have a pland_df
whose index attribute_values
level correspond to 'FID_Terr_B' (or whichever column you set to gb_col
in the snippet above.
Let me know how this goes.
Best, Martí
Hi @martibosch, I believe I've got everything working now. For the CRS issue, your code plots a single zonal landscape, which appears to have been the goal.
For the masks_col_index, I have made a new unique ID field (copied from the one ArcGIS gives), which gets it to behave properly. I think Arc's FID field (Data type is "Object ID" in Arc) isn't readable by other programs which is what was throwing pylandstats off when I tried to have mask_index_col = "FID"
The missing rows/geometries are due to sliver polygons, which did not include a full pixel. These are now (mostly) removed in my preprocessing steps.
Everything appears to be working properly. Thank you so much. Evan
Hello @emuise,
glad that everything seems to work out. Your data made me realize that the performance of pylandstats for a ZonalAnalysis
with large number of rows (e.g., more than 50) should be improved. I will work towards that in the next releases.
Thank you again for using pylandstats. Best, Martí
Description
Is it possible to implement using a shapefile for the mask_arr argument in pls.ZonalAnalysis? I don't need to buffer the zones, just generate the stats within each zone. From what I currently gather mask_arr takes rectangles in the form of numpy arrays.