Open martinfleis opened 2 months ago
Further observations:
This massive difference happens only when we need to load practically whole raster into memory, as is the case in the notebook sampling points across the whole extent. If I sample points only from a small spatial subset (a 2x2km square in the corner), xvec is actually faster than rasterio, which does not seem to care about the spatial distribution of points.
Also, if the whole array is loaded in memory, we are faster.
It seems that no matter what, rasterio is able to load only the pixels needed.
We should at least document this behaviour and point to sample
if the use case requires heavy sampling over the whole area.
One of the questions in the recent Earthmover's webinar on xvec was about the performance of
extract_points
compared to rasterio'ssample
method. I have never tested this before so wanted to give it a go and for a large lazy-loaded raster (digital terrain model), ourextract_points
is waaaay slower. See https://notebooksharing.space/view/4459f651d27b2f214f8590c30aba0782f7515d4c16b00dcd3283492f19f8e694#displayOptions=The DTM is from https://geoportalpraha.cz/en/data-and-services/97d2c9c11aa9478cb21b469b8a4f820e in case you'd like to test the same but any raster should do the trick I assume.
Under the hood,
extract_points
is simply passing the coordinates to.sel
withmethod='nearest'
, which should be doing exactly the same as rasterio'ssample
. https://github.com/xarray-contrib/xvec/blob/66b541bd509b4bcaade0bbeed3dd90b852b602a3/xvec/accessor.py#L1261-L1263This is not optimal.
We could possibly use
sample
via rioxarray if the raster is loaded via xarray as it is available throughdtm_da.rio._manager.acquire().sample(list(zip(x, y)))
but that is relying on a private API of rasterio. I'll open an issue there if there's an appetite to expose sample on therio
accessor.Outside of relying on rasterio, is there a way of speeding it up using some xarray magic?