leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 6 forks source link

New Dataset [MODIS cloud data for climate model proxies] #51

Open RobertPincus opened 1 year ago

RobertPincus commented 1 year ago

Dataset Name

MCD06COSP_M3_MODIS - MODIS (Aqua/Terra) Cloud Properties Level 3 monthly, 1x1 degree grid

Dataset URL

https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/MCD06COSP_M3_MODIS

Description

This dataset contains monthly-mean cloud properties as observed by the two MODIS instruments on the Terra (morning orbit) and Aqua (afternoon) satellites.

Size

Roughly 23 years of monthly files of ~65 Mb each = ~17 Gb

License

Unspecified as far as I can tell - no restrictions

Data Format

NetCDF

Access protocol

HTTP(S)

Source File Organization

There is a single file per directory following $ROOT/Month-start-julian-day/$filename, though the filename is not predictable.

Example URLs

https://github.com/RobertPincus/MODIS-COSP-data

contains a shell script to download all the files and a Python script to extract, rename, and reorganize the bits people want to use.

Authorization

None

Transformation / Processing

Each monthly files contains netCDF4 groups, one per output variable. For each variable we want to extract only the "Mean" value, rename this to the group name, and concatenate the data along the time dimension. A Zarr store can hold

Processing is described at the end of section 2.3 in doi:10.5194/essd-15-2483-2023. A GitHub repo contains a shell script to download all the files and a Python script to extract, rename, and reorganize the bits people want to use.

Target Format

Zarr

Comments

No response