oceansites / dmt

Activities of the OceanSITES Data Management Team
http://www.oceansites.org/data
6 stars 1 forks source link

long time series files metadata and file names #48

Open ngalbraith opened 5 years ago

ngalbraith commented 5 years ago

Suggest a method for including metadata about individual deployments by providing a sample CDL file. Here's a strawman ( but please see links to updated versions, below)

OS_Stratus_201104-201804_D_FLTS-1D_longcdl.txt

ngalbraith commented 5 years ago

Discussion of file names for LTS data files Matthias - file naming and organization of DATA_GRIDDED PPTX https://docs.google.com/presentation/d/1iYvZ14D9ABXpy5bMUv9egfMJ630t8uAuhbz7DJotxuE/edit?usp=sharing How to indicate long time-series (level 3) and derived (level 4) products in their file names? Cf. present file names for deployment-by-deployment (level 2) files: OS_MOVE1_12_D_MICROCAT.nc OSKEO[LTS?]200406_D_TVMBP_32N145E_10m.nc How about OS_KEO_LTS-2004-2006_D_TVMBP_32N145E_10m.nc

Nathan: Add “LTS” after OS
NGalbraith: this could break existing (external and GDAC) software, which already knows how to parse all the way through the last field. Either it should be at the end (in the user-supplied field) or part of the deployment code. Matthias: Change deployment code (“12” in example above) to something like “LTS” for level 3 and “PROD” for level 4 products. (+1 vote B.Greenwood) N.Galbraith: we are (in some cases) not publishing the complete time series in a single file, so we need a way to indicate something like deployment 1 through 10 or 11 through 17. ‘LTS-1-10’ would be fine, but not just LTS, please. Jeff: Add 'L3' in place of 'deployment' for long time-series/gridded files. Add 'L4' in place of 'deployment' for derived files. NGalbraith: we have multiple derived files, e.g. containing surface fluxes in single deployment files, so overloading the deployment field for derived files will not work very well for us.

ngalbraith commented 4 years ago

In reply to Jeff: Add 'L3' in place of 'deployment' for long time-series/gridded files. Add 'L4' in place of 'deployment' for derived files. Different groups may have different numeric codes for processing level; we settled on letters that have some intrinsic meaning because of our experience with QC codes, where the codes could so easily be misunderstood. Also, a close look at the levels (in Matthias' presentation, tells me levels 1-2 do not really apply to our data. Processing satellite data is really quite different from in situ data.

ngalbraith commented 4 years ago

New example CDL files: met flux Please comment, especially if there is anything unclear about how these are structured or what the fields represent.

petejan commented 4 years ago

nice example files,

Should the Deployment variable have dimensions of TIME so that you can find out which deployment the TIME sample came from?

I add a \n at the end of each history global attribute entry to the text is easier to divide up.

Also I think the FillValue should be _FillValue (the underscore is needed) this maybe why its not being applied in your email question. cf conventions

ngalbraith commented 4 years ago

Thanks - good catch on FillValue - not sure how that happens, I should probably check all my NetCDF files.

The Deployment variable is definitely not well described (it doesn't even have a long name!), so thank you for helping me see that.

It's a 'container variable', meant to hold the details about the deployments that went into the LTS file. It's also a dimension, which allows me to provide the right number of deployment-varying fields, like the sensor heights, anchor positions, water depth. It's also meant to contain start and end dates, deployment and recovery ships/cruises; those seem to have dropped out, but I'll get them back in.

I think you're suggesting that there should be a way to easily see which deployment applies to a specific point in time. My original plan was for the start/end dates to provide that info, but that isn't really a good way to handle it.

Maybe another variable with a TIME dimension (i.e.: deployment_number(TIME)) would be more helpful? Or, is there a standard way to do this?

petejan commented 4 years ago

Is there some reason you need Deployment to be a netCDF coordinate variable (ie a variable with the same name as a dimension)? It not a geographic coordinate its sort of a time coordinate, but you already have one of those.

Why not have a dimension of Deployment and a variable deployment_number(TIME) with the same attributes as your Deployment variable? This is like the Indexed ragged array representation. This variable only has to be a byte type so does not take much space, if that's an issue.