pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
https://pangeo-forge.readthedocs.io/en/latest/
Apache License 2.0
39 stars 63 forks source link

Proposed Recipes for PyQG Subgrid Forcing (Ross et al. 2022) #231

Open cmdupuis3 opened 1 year ago

cmdupuis3 commented 1 year ago

Dataset Name

PyQG Subgrid Forcing

Dataset URL

The base URL is https://g-402b74.00888.8540.data.globus.org/, but see below for examples.

Description

For full details, see the official publication here: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2022MS003258

The basic premise of the research behind this dataset is to use a quasi-geostrophic model (PyQG) for developing and testing ML-based parametrizations. The goal was to filter a high resolution (expensive) simulation to estimate sub-grid scale effects, which could then be incorporated in to a low resolution (cheap) simulation. The method of augmenting a low-resolution QG model with accurate subgrid-scale parameterization is important because it is a much faster model than full, high-resolution simulations, which are cost-prohibitive in a number of contexts.

Here, I am interested in creating a Pangeo Forge recipe for the PyQG data that was generated.

The scientific reasoning behind why these files are different are as follows:

License

Unknown

Data Format

Zarr

Data Format (other)

No response

Access protocol

Globus

Source File Organization

The files are arranged in the following heirarchical structure:

eddy/
    low_res.zarr
    high_res.zarr
    forcing1.zarr
    forcing2.zarr
    forcing3.zarr
jet/
    low_res.zarr
    high_res.zarr
    forcing1.zarr
    forcing2.zarr
    forcing3.zarr

Example URLs

Each of these files can be obtained by appending these paths to the Globus-based root path, which is https://g-402b74.00888.8540.data.globus.org/.

As an example:

https://g-402b74.00888.8540.data.globus.org/eddy/low_res.zarr

Authorization

None

Transformation / Processing

We may need a structure like that of DataTree to deal with the heirarchical file structure if we want a single, unified dataset. Since the top level is relatively simple though, it could also make sense to break it into two separate datasets with a simpler structure, obviating the need for a tree-like structure.

Target Format

Zarr

Comments

No response

cisaacstern commented 1 year ago

Thanks for opening this, @cmdupuis3!

it could also make sense to break it into two separate datasets with a simpler structure, obviating the need for a tree-like structure

This is the easiest way to pursue this using the latests pangeo-forge-recipes release (which does not yet support DataTree). Multiple recipes can be specified in a single recipe.py module. Please let me know how I can help!