pangeo-forge / pangeo-forge-recipes

Python library for building Pangeo Forge recipes.
https://pangeo-forge.readthedocs.io/
Apache License 2.0
123 stars 54 forks source link

Using `pangeo-forge-recipes` with two concat dims #348

Closed leifdenby closed 2 years ago

leifdenby commented 2 years ago

I work with a Large-Eddy simulation model which decomposes the 3D simulation domain into horizontal 2D grid of columns, one for each CPU to handle during simulation. The output netCDF files are stored in the same way, so that I have one file for each CPU used during execution. I was thinking I might be able to use pangeo-forge-recipes to produce a single zarr-based datastore for my simulation output (rather than the individual netCDF files). Unfortunately, I get an exception from the XarrayZarrRecipe recipe that it doesn't currently support multiple concat dims.

Is this the wrong kind of idea for the purpose of this package?

Below is what I've done so far:

from pathlib import Path
from pangeo_forge_recipes.patterns import ConcatDim
from pangeo_forge_recipes.patterns import FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe

SOURCE_BLOCK_FILENAME_FORMAT_3D = "{file_prefix}.{i:04d}{j:04d}.nc"

def make_full_path(i, j):
    data_root = Path(
        "/nfs/see-fs-02_users/earlcd/datastore/a289/LES_analysis_output/uclales/rico_gcss/raw_data"
    )
    return data_root / SOURCE_BLOCK_FILENAME_FORMAT_3D.format(
        i=i, j=j, file_prefix="rico_gcss"
    )

col_dim_x = ConcatDim("i", list(range(1, 4)))
col_dim_y = ConcatDim("j", list(range(1, 4)))

pattern = FilePattern(make_full_path, col_dim_x, col_dim_y)
recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=10)
rabernat commented 2 years ago

Thanks for reporting Leif!

This is a duplicate of #140. It is definitely high on our list of priorities for development! We hope to support it within the next few months.

@cisaacstern - would you mind making a user story for this?

martindurant commented 2 years ago

Note that kerchunk's MultiZarrToZarr does support multiple dimensions, so it may be possible to plumb it into pangeo-forge without too much of a rewrite. Of course, it's still on us to get that done :)