mdbartos / pysheds

:earth_americas: Simple and fast watershed delineation in python.
GNU General Public License v3.0
708 stars 191 forks source link

Add support for xarray #179

Open mdbartos opened 2 years ago

cheginit commented 2 years ago

I am the developer of Py3DEP package. It gets topography data from The National Map's 3DEP at a given resolution. It returns the data as an xarray.Dataset (or xarray.DataArray). It would be great if pysheds can have a from_array function so the .data property of the DataArray object (a numpy.ndarray) can be passed along with other attributes like transform and crs that py3dep provides. This approach doesn't need you to add xarray as a dependency. However, if you don't mind adding xarray and rioxarray as new dependencies, the output of py3dep can be directly used.

mdbartos commented 2 years ago

Hi @cheginit,

Thanks for the feedback. I would like to add automatic support for xarray. One workaround for now is to instantiate a Raster using the xarray's spatial reference info. Let's say we have the following variables:

Then we can instantiate a Raster with the spatial reference information from the xarray:

from pysheds.view import Raster, ViewFinder

viewfinder = ViewFinder(affine=affine, shape=data.shape, crs=crs, nodata=nodata)
raster = Raster(data, viewfinder=viewfinder)

This raster can now be fed into any of the hydrologic functions.

fdir = grid.flowdir(raster)

See the docs page on Rasters for more info: https://mattbartos.com/pysheds/raster.html

cheginit commented 2 years ago

The workaround is simple enough, thanks!

I agree that direct xarray support is much better. Py3DEP returns the results as parallelized datasets (with dask) which combined with numba version of pyshed functions, can make operations on large datasets even faster. Besides, since using .data property of xarray loads the data into memory, you'd lose the dask advantage with the Raster approach.