google / xarray-beam

Distributed Xarray with Apache Beam
https://xarray-beam.readthedocs.io
Apache License 2.0
125 stars 7 forks source link

High-level xarray_beam.Mean() transform #73

Closed copybara-service[bot] closed 1 year ago

copybara-service[bot] commented 1 year ago

High-level xarray_beam.Mean() transform

This makes it easy to write high-level aggregations of distributed Xarray-Beam datasets, e.g.,

DatasetToChunks(...) | xbeam.Mean('time') | ChunksToZarr(...)

Originally, I didn't think we needed this in Xarray-Beam, because you could aggregate over dimensions on each chunk, and then use xbeam.Mean.PerKey(). xbeam.Mean() is an improvement for two reasons:

  1. It requires less user code: no need to take mean twice and update the key
  2. It's less error prone: it computes the correct answer, even if dimension(s) being reduced over have different lengths on different chunks.

I'll update narrative docs in a follow-on.