google / xarray-beam

Distributed Xarray with Apache Beam
https://xarray-beam.readthedocs.io
Apache License 2.0
126 stars 7 forks source link

Introducing FilePatternToChunks: IO with Pangeo-Forge's FilePattern interface. #31

Closed alxmrs closed 2 years ago

alxmrs commented 2 years ago

This if the first of a few changes that will let users read in datasets using Pangeo-Forge's FilePattern interface 0. Here, users can describe how data is stored along concat and merge dimensions. This transform will read in the datasets into chunks. This module can be leveraged in pipelines to convert natively formatted datasets to Zarr.

To make use of this transform, the user will need to install pangeo-forge-recipes separately. This dependency is included in the test dependencies.

As of now, this transform is not exposed to the user (i.e., not included in the primary __init__.py). I plan to do this (and update the docs) once the module is tested and feature complete (#29).