nanoos-pnw / site-timeseries

Notes and materials for deciding on data structures and access for long time series at fixed sites
1 stars 1 forks source link

Draft NetCDF CDL for target CMOP archiving files #2

Open emiliom opened 8 years ago

emiliom commented 8 years ago

@cseaton, I'm opening this issue to help us track and discuss this to-do item from our meeting. Let us know if you and Russell have made progress since then. BTW, please ping Russell on a reply to this issue, so I have his github profile and so he's automatically included in follow ups.

What I had in mind by "drafting" the CDL is that it may be easiest to manually craft a bare bones CDL for discussion, before doing any coding or actual writing of netcdf files.

cseaton commented 8 years ago

@RussellSenior has been working on this and has a preliminary draft of a CDL (at least in terms of fields).

RussellSenior commented 8 years ago

@emiliom my plan is to use the same structure as this: http://data.nodc.noaa.gov/testdata/netCDFTemplateExamples/timeSeries/BodegaMarineLabBuoy.cdl (this is the Orthogonal Timeseries example from https://www.nodc.noaa.gov/data/formats/netcdf/v1.1/). I think the station_name variable would be replaced by the deploymentid in our case, and we'd only have a single instrument (at least initially ... we might need more instruments if we pull in "external" data, for example for tide height in order to have a measure of depth when attached to a fixed structure). Does that sound reasonable?

emiliom commented 8 years ago

Jan. 14, From @RussellSenior:

Here's a link to my work-in-progress for a randomly chosen CT sensor deployment (deploymentid = 44): http://www.stccmop.org/data/NCEI/wip/44.nc I switched to using traditional coordinate variables, which made the compliance checker somewhat more happy. Again, many attribute values are empty still, definitely still a work-in-progress. When invoked as follows:

  python cchecker.py --test cf --criteria strict --verbose

the compliance checker has the following complaints summarized at the bottom:

--------------------------------------------------------------------------------
                  Reasoning for the failed tests given below:

Name                             Priority:     Score:Reasoning
--------------------------------------------------------------------------------
3.1 Variables contain valid CF Units   :3:     0/ 1 : unknown units type (psu)
                                                      for salinity
3.1 Variables contain valid units for t:3:     5/ 7 : units are C, standard_name
                                                      units should be K, units
                                                      are psu, standard_name
                                                      units should be 1
5.2 Latitude and longitude coordinates :3:     0/ 3 :
    conductivity                       :3:     0/ 1 :
        coordinates_reference_itself   :3:     0/ 1 : Variable conductivity's
                                                      coordinate references
                                                      itself
    salinity                           :3:     0/ 1 :
        coordinates_reference_itself   :3:     0/ 1 : Variable salinity's
                                                      coordinate references
                                                      itself
    temperature                        :3:     0/ 1 :
        coordinates_reference_itself   :3:     0/ 1 : Variable temperature's
                                                      coordinate references
                                                      itself

The units I understand, I'm not so sure about the coordinates_reference_itself.

RussellSenior commented 8 years ago

I managed to figure out why my nc files were so huge, it was a combination of netcdf4 and an unlimited dimension and a degenerate default chunksize. Fixed now, file much smaller (~6.5MB instead of 340MB):

http://www.stccmop.org/data/NCEI/wip/44-20160120-1203.nc

RussellSenior commented 8 years ago

Today's version is still missing important attributes, but many more of the low-hanging fruit have been plucked. It currently passes a strict cf compliance check (as above) with only the units complaints.

http://www.stccmop.org/data/NCEI/wip/44-20160121-1632.nc

RussellSenior commented 8 years ago

This version fixes the units for at least the limited test case, and currently gets a perfect 244/244 score on the strict cf compliance check (that doesn't mean it's actually perfect!):

http://www.stccmop.org/data/NCEI/wip/44-20160121-1703.nc

emiliom commented 8 years ago

Thanks for these updates, @RussellSenior, and great to see the progress! Apologies for not being on top of it myself last week. I'll try to dedicate time to focus on this tomorrow (Monday).

I still need to look more closely at this: "I switched to using traditional coordinate variables, which made the compliance checker somewhat more happy."

And regarding this, well, WOW:

I managed to figure out why my nc files were so huge, it was a combination of netcdf4 and an unlimited dimension and a degenerate default chunksize. Fixed now, file much smaller (~6.5MB instead of 340MB)