stuckyb / gcdl

6 stars 2 forks source link

catalog describing file structure/dimensions #32

Closed HeatherSavoy-USDA closed 2 years ago

HeatherSavoy-USDA commented 2 years ago

Can we have something in the catalog to describe how the dataset stores its variables and temporal increments? For example, we are currently expecting PRISM to have one file per variable per year or per month. DaymetV4 also splits variables and years into different files, but stores monthly data in a 12-band file. Other datasets will store multiple variables in a file. I think having some indication of this in the catalog would be better than determining it the subsetting/extracting functions since the method for determining the structure might be different among the datasets. Then we can have different subset/extract approaches per general structure type as needed.

HeatherSavoy-USDA commented 2 years ago

I'm not sure if a system like this already exists, but we could do something like:

Uppercase letters: one time increment per file. Lowercase letters: increment is in bands. Y - single variable and year per file (annual PRISM, Daymet) YM - single variable, year, and month per file (monthly PRISM) Ym - single variable and year, but months as bands, per file (monthly Daymet) Ymd - single variable and year, but months and days as bands, per file (daily Daymet) YMd - single variable, year, and month, but days as bands, per file (hypothetical, not familiar with any dataset like this) ymd - single variable, all daily increments in same file

If variables are combined into single file, include a 'v'? Though I can't think of a dataset that combines variables and time dimensions that is not NetCDF, a format that would help us differentiate variables from time dimension.

For output (#33), the user could specify a code like this? And default to vymd for point data (every variable and time increment in a single csv) or match the structure of first dataset (like crs and resolution).

HeatherSavoy-USDA commented 2 years ago

I think this concern is alleviated with b8427b939509e322c3884dd3052658b231b3cf0d and #50.