region-id inconsistent between depressions.shp and regions.shp

tlohde commented 1 year ago

Firstly, thank you for providing an excellent package/resource.

lidar version: 0.7.1
Python version: 3.11.0
Operating System: Windows 10

Description

Describe what you were trying to get done.

short-term goal, and where the problem is happening: aligning depression regions, with individual nested depressions

Tell us what happened, what went wrong, and what you expected to happen.

the region-id values in regions.shp/regions_info.csv that is generated using ExtractSinks() differ from those in the depressions.shp/depressions_info.csv generated using DelineateDepressions().
I would expect the region-id to be consistent between these.

What I Did

outdir='tmp/'
min_size = 50           # minimum number of pixels as a depression
min_depth = 30           # minimum depth as a depression
interval = 10         # slicing interval for the level-set method
bool_shp = False        # output shapefiles for each individual level

sink_path = ExtractSinks('sample.tif', min_size, outdir)

dep_id_path, dep_level_path = DelineateDepressions(sink_path, min_size, min_depth, interval, outdir, bool_shp)

# read in output and combine info csv with geometry from shapefile
# depressions
depressions = gpd.read_file('tmp/depressions.shp')
depressions = depressions.dissolve('id').reset_index() # makes for a tidier merge with depressions_info.csv (creates multipolygons)
dep_info = pd.read_csv('tmp/depressions_info.csv')
gdf = depressions.merge(dep_info, on='id')

# regions
regions = gpd.read_file('tmp/regions.shp').sort_values(by='id')
reg_info = pd.read_csv('tmp/regions_info.csv')
regions = regions.merge(reg_info, left_on='id', right_on='region-id').sort_values(by='id').reset_index(drop=True)

at this stage I would expect the region-id in regions to match that in gdf...however, they do not. Below is an illustration, which I think also provides a way to handle the discrepency (spatial join)

# read in raster (for plotting)
region_raster = rio.open_rasterio('tmp/region.tif')

# to enable comparison of region-ids intersection of depressions and regions shapefiles
overlaid = (gdf
            .dissolve('region-id')   # aggregate by region-id
            .reset_index() # to ensure result gdf has both region-id_1 and region-id_2 (where _2 is the value that correctly corresponds with those in regions.shp)
            .overlay(regions, keep_geom_type=False)
)

###### plotting

# random region number
R = 310

fig, axs = plt.subplots(figsize=[15,6],ncols=5)

# plot raster 
(region_raster==R).plot(ax=axs[0], add_colorbar=False)

# plot from regions shapefile
regions.loc[regions['region-id']==R].plot(ax=axs[1])

# plot from depreesion shapefile
gdf.loc[gdf['region-id']==R].plot(column='level', ax=axs[2])

# plot from intersection of depressions and regions
overlaid.loc[overlaid['region-id_1']==R].plot(ax=axs[3])

# plot from intersection of depressions and regions
overlaid.loc[overlaid['region-id_2']==R].plot(ax=axs[4])

# tidy up axes limits, and labels etc...
axs[0].set_xlim(axs[1].get_xlim())
axs[0].set_ylim(axs[1].get_ylim())
axs[0].set_aspect('equal')

axs[0].set_title(f'region:{R}\nfrom region.tif')
axs[1].set_title(f'region:{R}\nfrom regions.shp')
axs[2].set_title(f'region:{R}\nfrom depressions.shp')
axs[3].set_title(f'region:{overlaid.loc[overlaid["region-id_1"]==R,"region-id_2"].values[0]}\nfrom regions.shp')
axs[4].set_title(f'region:{overlaid.loc[overlaid["region-id_2"]==R,"region-id_1"].values[0]}\nfrom depressions.shp')

plt.subplots_adjust(wspace=0.35)

print(f"region: {R} in the depressions.shp file corresponds to region: {overlaid.loc[overlaid['region-id_1']==R,'region-id_2'].values[0]} in regions.shp")
print(f"region: {R} in the regions.shp file corresponds to region: {overlaid.loc[overlaid['region-id_2']==R,'region-id_1'].values[0]} in depressions.shp")

region: 310 in the depressions.shp file corresponds to region: 394 in regions.shp region: 310 in the regions.shp file corresponds to region: 245 in depressions.shp

the question

So, i think the question is... is the discrepency between the two region_ids expected/normal? If yes, is the use of .overlay() the best way to handle it and reconcile regions.shp/regions_info.csv and depressions.shp/depressions_info.csv, or is there an even more straightforward way? If no, have I done something wrong?

thank you

giswqs commented 1 year ago

Thank you for reporting. Did you try out the example notebook to see if it has the same issue? I suspect that might just be a raster-vector conversion issue rather than the lidar package

https://colab.research.google.com/github/giswqs/lidar/blob/master/examples/lidar_colab.ipynb

tlohde commented 1 year ago

I can't seem to make the issue appearin the example notebook. I changed the sink parameters to:

min_size = 20 
min_depth = 0.5 
interval = 0.2

in order generate more regions, and the region-id appears to be consistent between the two.

I will have a closer look at my DEM and the raster-vector conversion. As it is, I think the spatial join is good enough for my needs.

tlohde commented 1 year ago

Probelm solved/identified (i think)

The issue arose because regions of my DEM has regions where elevations are < 0 (and they are supposed to be, as I am dealing with a DEM that includes some ocean bathymetry). These areas were identified by ExtractSinks(), but not DelineateDepressions(), i.e. the there were a different number of unique region-ids in regions.shp/regions.tif and in depressions.shp/depressions.tif

Workaround

create a duplicate DEM that has offset that brings all elevations above 0.
use this in ExtractSinks()

potential solution

I have tried to follow what DelineateDepressions() is doing, and i think, a solution might be to allow level_image to be < 0.

Thank you, tlohde

opengeos / lidar