pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
Apache License 2.0
38 stars 63 forks source link

Amazon Sustainability Data Initiative ARCO Project #208

Open sharkinsspatial opened 1 year ago

sharkinsspatial commented 1 year ago

The Amazon Sustainabilty Data Initiative (ASDI) is funding work to expand the usability of datasets in the ASDI catalog. This work will involve several phases, one of which includes generating Analysis Ready Cloud Optimized (ARCO) formats of datasets currently available in archival formats.

To provide the best experience for end users of these ARCO formats, we hope to leverage the domain knowledge of researchers and engineers through open communication on staged-recipes around dataset specific considerations and format structure. Many of the datasets available through the ASDI are regularly gridded and distributed in archival formats compatible with existing recipe classes or classes that are under development. For these relevant datasets we plan to

  1. At a mininum, generate kerchunk reference indices as the canonical entrypoint for dataset usage.
  2. Generate a new Zarr archive if sufficient community need exists for a different chunking strategy, optimized for specific analysis tasks.

Below is an initial listing of datasets in the ASDI program that are under consideration for processing in pangeo-forge. We are soliciting community feedback on the prioritization of these datasets and recommendations on format structure. If an ARCO format for one of these datasets would be valuable in your work or you have previous experience with a dataset, please open a new proposed recipe issue referencing this issue in staged-recipes (if one does not already exist).

Dataset Manager Issue/Feedstock
CAFE60 reanalysis CSIRO
Coupled Model Intercomparison Project 6 ESGF and Pangeo feedstock
Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) Farallon Institute staged-recipes
HIRLAM Weather Model Finnish Meteorological Institute
SILAM Air Quality Finnish Meteorological Institute
ECMWF ERA5 Reanalysis Intertrust staged-recipes
CAM6 Data Assimilation Research Testbed (DART) Reanalysis: Cloud-Optimized Dataset NCAR
Community Earth System Model Large Ensemble (CESM LENS) NCAR
Community Earth System Model v2 Large Ensemble (CESM2 LENS) NCAR staged-recipes
NA-CORDEX - North American component of the Coordinated Regional Downscaling Experiment NCAR
Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling Dataset NOAA
JMA Himawari-8 NOAA staged-recipes
NOAA Atmospheric Climate Data Records NOAA
NOAA Climate Forecast System (CFS) NOAA
NOAA Fundamental Climate Data Records (FCDR) NOAA
NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17 NOAA staged-recipes
NOAA Global Ensemble Forecast System (GEFS) NOAA
NOAA Global Ensemble Forecast System (GEFS) Re-forecast NOAA staged-recipes
NOAA Global Extratropical Surge and Tide Operational Forecast System (Global ESTOFS) NOAA
NOAA Global Forecast System (GFS) NOAA staged-recipes
NOAA Global Hydro Estimator (GHE) NOAA
NOAA Global Mosaic of Geostationary Satellite Imagery (GMGSI) NOAA
NOAA High-Resolution Rapid Refresh (HRRR) Model NOAA staged-recipes
NOAA National Digital Forecast Database (NDFD) NOAA
NOAA National Water Model Short-Range Forecast NOAA
NOAA North American Mesoscale Forecast System (NAM) NOAA
NOAA Oceanic Climate Data Records NOAA staged-recipes
NOAA Rapid Refresh (RAP) NOAA
NOAA Rapid Refresh Forecast System (RRFS) Ensemble [Prototype] NOAA
NOAA Terrestrial Climate Data Records NOAA
NOAA U.S. Climate Gridded Dataset (NClimGrid) NOAA
NOAA Unified Forecast System Subseasonal to Seasonal Prototypes NOAA
NREL National Solar Radiation Database NREL
NREL Wind Integration National Dataset NREL
Atmospheric Models from Météo-France OpenMeteoData
SILO climate data on AWS Queensland Government
CMIP6 GCMs downscaled using WRF UCLA Center for Climate Science
UK Met Office Atmospheric Deterministic and Probabilistic Forecasts UK Met Office
Downscaled Climate Data for Alaska University of Alaska
High Resolution Downscaled Climate Data for Southeast Alaska University of Alaska
Sea Surface Temperature Daily Analysis: European Space Agency Climate Change Initiative product version 2.1 University of Reading
rsignell-usgs commented 1 year ago

@sharkinsspatial I do not see the National Water Model retrospective 1km gridded data in the list. I linked the notebooks I used to process this here:

glizee-tech commented 1 year ago

Hello, I am looking for ECMWF ERA5 Reanalysis on a cloud solution for academical work. Indeed, I need some specific variable that are not available on aws s3 solution or on gcp. I see that you consider to process it fully in pangeo-forge. Are you able to give a deadline when it will be available ? Should I create my cloud solution myself which will be redundant with your futur solution and totally contrary to what this project is all about but useful in the short term. I hope I'm writting at the right spot.

rabernat commented 1 year ago

Are you able to give a deadline when it will be available ?

No. ERA5 is extremely large and complex. Given the limited resources in this project, we can make any commitments to a timeline.

Should I create my cloud solution myself which will be redundant with your futur solution and totally contrary to what this project is all about but useful in the short term.

Yes, this is what we would recommend.

glizee-tech commented 1 year ago

@rabernat Thank you for your quick reply. Ok so I will create my own cloud solution but I will follow the progress of this amazing project.

rsignell-usgs commented 1 year ago

@glizee-tech I also needed to access some ERA5 data that was not on cloud, and gave a Pangeo Showcase talk on my approach last fall. Just in case it's useful!

glizee-tech commented 1 year ago

@rsignell-usgs Yes indeed it's going to be very useful! Thank you very much