pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
https://pangeo-forge.readthedocs.io/en/latest/
Apache License 2.0
39 stars 63 forks source link

Proposed Recipes for NOAA Atmospheric Climate Data Records #223

Open rbavery opened 2 years ago

rbavery commented 2 years ago

Dataset Name

NOAA Atmospheric Climate Data Records

Dataset URL

https://registry.opendata.aws/noaa-cdr-atmospheric/

Description

"NOAA's Climate Data Records (CDRs) are robust, sustainable, and scientifically sound climate records that provide trustworthy information on how, where, and to what extent the land, oceans, atmosphere and ice sheets are changing. These datasets are thoroughly vetted time series measurements with the longevity, consistency, and continuity to assess and measure climate variability and change. NOAA CDRs are vetted using standards established by the National Research Council (NRC)."

License

"Open Data. There are no restrictions on the use of this data."

Data Format

NetCDF

Data Format (other)

No response

Access protocol

S3

Source File Organization

for each variable there is one file per day. each file is a single time step. the folder hierarchy is as follows

data/ monthly/ daily/ 1982/ 1983/ .... documentation/

we are proposing just to deal with the daily products

Example URLs

→ aws s3 ls --no-sign-request s3://noaa-cdr-aerosol-optical-thickness-pds/data/daily/1982/
2022-06-24 14:15:19   26013129 AOT_AVHRR_v04r00_daily-avg_19820101_c20220505.nc
...

example urls

s3://noaa-cdr-aerosol-optical-thickness-pds/data/daily/1982/AOT_AVHRR_v04r00_daily-avg_19820101_c20220505.nc

s3://noaa-cdr-aerosol-optical-thickness-pds/data/daily/1982/AOT_AVHRR_v04r00_daily-avg_19820102_c20220505.nc

Authorization

No response

Transformation / Processing

no transformations needed. simple wildcard will be used to extract the datetime from the file for the FilePattern

Target Format

Reference Filesystem (Kerchunk)

Comments

this is a collection of datasets with many different variables. variables are separated in their own directories. I'll check these off as I create recipes for them and submit a PR

arn:aws:s3:::noaa-cdr-aerosol-optical-thickness-pds aws s3 ls --no-sign-request s3://noaa-cdr-aerosol-optical-thickness-pds/

arn:aws:s3:::noaa-cdr-cloud-properties-isccp-pds aws s3 ls --no-sign-request s3://noaa-cdr-cloud-properties-isccp-pds/

this dataset has 4 distinct products in "isccp" and 3 in "isccp-basic", with "HXG" absent from the basic subdir.

from the algo doc, " The HGH Product provides the monthly average of the HGG Product at each of eight times-of-day UTC. The HGM Product is the average of the eight HGH Products for each month.". So HGM, HGH, and HGG seem important and are present in both isccp folders. HXG product seems like raw-er data so might be less of a priority to include.

arn:aws:s3:::noaa-cdr-cloud-properties-polar-orbiter-nasa-pds aws s3 ls --no-sign-request s3://noaa-cdr-cloud-properties-polar-orbiter-nasa-pds/

arn:aws:s3:::noaa-cdr-hydrological-properties-pds aws s3 ls --no-sign-request s3://noaa-cdr-hydrological-properties-pds/

arn:aws:s3:::noaa-cdr-ocean-heat-content-pds aws s3 ls --no-sign-request s3://noaa-cdr-ocean-heat-content-pds/

arn:aws:s3:::noaa-cdr-ocean-heatflux-pds aws s3 ls --no-sign-request s3://noaa-cdr-ocean-heatflux-pds/

arn:aws:s3:::noaa-cdr-ocean-nearsurface-atmos-profiles-pds aws s3 ls --no-sign-request s3://noaa-cdr-ocean-nearsurface-atmos-profiles-pds/

arn:aws:s3:::noaa-cdr-outgoing-longwave-radiation-daily-pds aws s3 ls --no-sign-request s3://noaa-cdr-outgoing-longwave-radiation-daily-pds/

arn:aws:s3:::noaa-cdr-outgoing-longwave-radiation-monthly-pds aws s3 ls --no-sign-request s3://noaa-cdr-outgoing-longwave-radiation-monthly-pds/

arn:aws:s3:::noaa-cdr-ozone-esrl-pds aws s3 ls --no-sign-request s3://noaa-cdr-ozone-esrl-pds/

arn:aws:s3:::noaa-cdr-precip-cmorph-pds aws s3 ls --no-sign-request s3://noaa-cdr-precip-cmorph-pds/

arn:aws:s3:::noaa-cdr-precip-gpcp-daily-pds aws s3 ls --no-sign-request s3://noaa-cdr-precip-gpcp-daily-pds/

arn:aws:s3:::noaa-cdr-precip-gpcp-monthly-pds aws s3 ls --no-sign-request s3://noaa-cdr-precip-gpcp-monthly-pds/

arn:aws:s3:::noaa-cdr-precip-nexrad-qpe-pds aws s3 ls --no-sign-request s3://noaa-cdr-precip-nexrad-qpe-pds/

arn:aws:s3:::noaa-cdr-precip-persiann-pds aws s3 ls --no-sign-request s3://noaa-cdr-precip-persiann-pds/

arn:aws:s3:::noaa-cdr-solar-spectral-irradiance-pds aws s3 ls --no-sign-request s3://noaa-cdr-solar-spectral-irradiance-pds/

arn:aws:s3:::noaa-cdr-total-solar-irradiance-pds aws s3 ls --no-sign-request s3://noaa-cdr-total-solar-irradiance-pds/