google / xarray-beam

Distributed Xarray with Apache Beam
https://xarray-beam.readthedocs.io
Apache License 2.0
126 stars 7 forks source link

Support for striding / rolling windows #16

Open mjwillson opened 3 years ago

mjwillson commented 3 years ago

Nice library, thanks! Will it work to pass e.g. dataset.rolling(...) or dataset.rolling(...).construct(...) to xarray_beam.DatasetToChunks? (I'm assuming not, or that it would result in something hugely inefficient?) If not, would it be possible to support striding / sliding windows in DatasetToChunks?

shoyer commented 3 years ago

DatasetToChunks is a rather sharp tool at the moment: you can pass in Dask backed xarray.Dataset objects and it often works, but you have to be careful (1) to ensure that the dask graph is not too large to serialize and send to each worker, and (2) that it is safe to compute individual chunks from the dataset on separate workers (without running out of memory or doing too much duplicate work).

So yes, you probably could get away with passing the result of dataset.rolling(...).construct(...) into DatasetToChunks but you would need to take some care.

Ultimately I think it would be nice to have Beam PTransforms for rolling window computation, similar to dask.array.map_overlap: https://docs.dask.org/en/latest/array-overlap.html

I don't have any immediate plans to work on this but it would definitely be within scope for the project.