Note that I had thought about using a vendored version of rasterio.features.rasterize from another library. These are some bullet points:
geocube - would fit well since it integrates with rioxarray, but it has rather heavy dependencies on scipy, and also uses odc-geo (which is ok, but more dependencies to handle).
regionmask - another promising one, with support for crossing the dateline, but the API seems a little less flexible for what needs to be done here.
That said, the turning point to switch to datashader-based rasterization might be when shapely 2.0 gets released and matures enough to the point that the geospatial vector Python ecosystem starts using it.
TODO:
[x] Initial implementation of RasterioRasterizerIterDataPipe
[x] Add unit test and enhancement to pass xarray.DataArray inputs to out parameter in rasterio.features.rasterize
[ ] Think about refactoring PyogrioReader to not return tuples of (filename, dataobj) (maybe do in separate PR)
Closing as superseded by datashader-based method #34 and #35 which offers a faster and more customizable (albeit more steps) way of performing rasterization.
An iterable-style DataPipe for turning vector geometries into raster images! Uses
rasterio
to do the rasterization.Preview at https://zen3geo--32.org.readthedocs.build/en/32/api.html#zen3geo.datapipes.RasterioRasterizer
Note that I had thought about using a vendored version of
rasterio.features.rasterize
from another library. These are some bullet points:scipy
, and also usesodc-geo
(which is ok, but more dependencies to handle).dea_tools.spatial.xr_rasterize
function which is nearly perfect at https://github.com/GeoscienceAustralia/dea-notebooks/blob/0.2.5/Frequently_used_code/Rasterize_vectorize.ipynb and https://github.com/GeoscienceAustralia/dea-notebooks/blob/0.2.5/Tools/dea_tools/spatial.py#L166-L320. But again, it has heavy dependencies likescipy
and has several try-except/if-then clauses to handle a.geobox
accessor. Still, it will be a useful reference for this Pull Request!Alternatively, rasterization can also be done using
datashader
which is super fast as it usesnumba
(see e.g. code snippet at https://github.com/weiji14/deepicedrain/blob/41917ea515edbe548975e2a25c25ff55c6eb4b1a/deepicedrain/spatiotemporal.py#L109-L133). However, on the dependencies front:spatialpandas
instead ofgeopandas
(though see https://github.com/holoviz/datashader/issues/1006#issuecomment-859928820)spatialpandas
's maintainence status: there might be some major refactoring (read: unstable) according to notes in https://docs.google.com/document/d/1BkL0arf1Lz6fHgVBEJNxKbFmVN8glNQXmDKdsuT0GcU/edit# and https://gist.github.com/jpivarski/30c2671c6860393974ff3db2891f20edscipy
That said, the turning point to switch to
datashader
-based rasterization might be whenshapely
2.0 gets released and matures enough to the point that the geospatialvector
Python ecosystem starts using it.TODO:
RasterioRasterizerIterDataPipe
xarray.DataArray
inputs toout
parameter inrasterio.features.rasterize