Open GrahamReveley opened 1 year ago
Also ran into this 👍
@GrahamReveley @bstadlbauer thanks contributing here and I believe this is a known issue to @thuydotm, but hasn't been formally documented. also sorry for delay in responding.
We are going to prioritize this proximity over the next sprints, but happy to help review / test a PR if y'all want contribute a fix.
@brendancol We do have a constrained usecase and are working with a changed fork of the proximity code here, but IIRC that's nothing that we can easily generalize to work for all usecases
Xrspatial not working with large dask backed arrays I believe there's a bug in the xrspatial.proximity method whereby the coordinates for a large array (even if its a dask array) are loaded into memory as a numpy array instead of there being a dask array. For context I'm attempting to compute a distance to coast for a 90m global dataset. I had a look into the source code and I believe the issue is here (in the _process function of proximity):
` raster_dims = raster.dims if raster_dims != (y, x): raise ValueError( "raster.coords should be named as coordinates:" "({0}, {1})".format(y, x) )
`
Therefore XS and YS are huge numpy arrays that don't fit into memory, whereas, if the input data is a dask array these should probably be dask arrays rather than numpy arrays. Later on in the processing sequence the proximity calculation is either done using dask or numpy and I think there should be a similar thing here for dask/numpy processing.
If I've missed something please let me know, more than happy to share some more code if needs be and in the mean time I can use gdal_proximity.py directly but I guess it would be slower than using the dask backed xrspatial.
Thanks!