pysal / tobler

Spatial interpolation, Dasymetric Mapping, & Change of Support
https://pysal.org/tobler
BSD 3-Clause "New" or "Revised" License
144 stars 30 forks source link

Explore Dask as a scaler #181

Open darribas opened 11 months ago

darribas commented 11 months ago

This is a spin-off issue from the conversation in #180 so we don't loose track of it and also don't distract discussion in that PR.

Original suggestion from @knaaptime:

Categoricals are important, for example, to interpolate rasters (e.g., land use), and having the functionality out in the wild would help it get tested.

it would be useful to see whether this can provide a boost to the existing functionality we have for vectorizing rasters

And response from @darribas:

It’s slightly different. We could think of a way of vectorizing pixels and doing a spatial dissolve with dask. I don’t know if that’d be faster (it'd be at least parallel/out-of-core), but it’s definitely different code (though similar philosophy), so I'd be tempted to leave that for a different PR, perhaps create an issue to remember this option in case we have bandwidth (or need) in the future to explore it.

In the case suggested above, a strategy to use Dask would be:

Once we enter a Dask data structure, all computations are lazy and parallel when .compute() is called, providing scalability and parallelism. But I'm not sure if that will make it faster than rasterio's vectorisation, which I imagine relies on GEOS? It might because the dissolve should be a fast one because all polygons to dissolve are four-point squares. One worth a shot for sure.