Open cmdupuis3 opened 1 year ago
Thanks for opening this, @cmdupuis3!
it could also make sense to break it into two separate datasets with a simpler structure, obviating the need for a tree-like structure
This is the easiest way to pursue this using the latests pangeo-forge-recipes
release (which does not yet support DataTree). Multiple recipes can be specified in a single recipe.py
module. Please let me know how I can help!
Dataset Name
PyQG Subgrid Forcing
Dataset URL
The base URL is
https://g-402b74.00888.8540.data.globus.org/
, but see below for examples.Description
For full details, see the official publication here: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2022MS003258
The basic premise of the research behind this dataset is to use a quasi-geostrophic model (PyQG) for developing and testing ML-based parametrizations. The goal was to filter a high resolution (expensive) simulation to estimate sub-grid scale effects, which could then be incorporated in to a low resolution (cheap) simulation. The method of augmenting a low-resolution QG model with accurate subgrid-scale parameterization is important because it is a much faster model than full, high-resolution simulations, which are cost-prohibitive in a number of contexts.
Here, I am interested in creating a Pangeo Forge recipe for the PyQG data that was generated.
The scientific reasoning behind why these files are different are as follows:
License
Unknown
Data Format
Zarr
Data Format (other)
No response
Access protocol
Globus
Source File Organization
The files are arranged in the following heirarchical structure:
Example URLs
Each of these files can be obtained by appending these paths to the Globus-based root path, which is
https://g-402b74.00888.8540.data.globus.org/
.As an example:
Authorization
None
Transformation / Processing
We may need a structure like that of DataTree to deal with the heirarchical file structure if we want a single, unified dataset. Since the top level is relatively simple though, it could also make sense to break it into two separate datasets with a simpler structure, obviating the need for a tree-like structure.
Target Format
Zarr
Comments
No response