Closed BeckyBaltes closed 9 years ago
@BeckyBaltes @dpsnowden These files contain significantly more information/data than the current DAC standard requires. Here are some first thoughts:
There are likely other differences, these are just the big ones. These differences alone would require significant rewriting of existing code or likely a decent time investment to write a converter.
@kerfoot are you referring to a converter from the sample netcdf format to the expected DAC2.0 format?
What about having the AOML team follow the submission process documentation from the beginning and creating a netCDF file in the format we expect? Might be a good test of the system.
@kknee : I'd definitely prefer them follow the process, but I'm guessing they have already spent considerable time and resources to get it to the NODC format they are currently using. Not sure how excited they'll be to start over.
@dpsnowden : any feeling for this? The current DAC workplan and SOW doesn't have anything in there about writing convertors for various groups.
Ideally, this becomes a repeatable process and it probably doesn't make sense to start writing converters for everyone, so I think it's fine to start them with the process and see what they can do. @dpsnowden pointed them to the wiki again this morning and to this thread so if they are on github they can weigh in and track progress.
Agreed, we started this whole process with the assumption that the formatting would be left to the data providers to the extent possible. Let's see how far we can get with this. But, if the process proves impossible for various reasons then we will need to revisit our assumption and budget for it. If we can't get this data integrated inside of a finite window (1 month?) then we need to think about converters or other technical assistance. The pretty maps and tools in the DAC aren't useful if it isn't full of data.
The more help we can provide in terms of "change x to y in your netcdf file" the better.
Finally, @kerfoot mentioned that they have more metadata in their files than we currently require. I think we should think about adopting the policy that this situation is ok. If we all agreed that more metadata from the provider is better, then we don't want to discourage them from writing it. How would we address this? Can we have rigid standardization of some things and flexibility elsewhere?
@dpsnowden I think that, as long as they have the variables and attributes that we require, additional data would not be prohibitive and the DAC would be able to serve it. The trick would be setting up erddap datasets in which the underlying .nc file contents are different, depending upon who submitted the data, assuming they wanted all of their data to be accessible.
so as long as DAC 2.0 documentation is ready ( @kerfoot please confirm) then ball is in AOML's court on this issue?
@kknee The doco on the file format is ready. It's been reviewed by myself and Bob Simons. Since DAC 2.0 is not officially up, the doco on the file submission process is not completely up to date. But they'll need some time to get the files written before they need to worry about submission.
Good news. I agree that AOML has a role to play here. But, I still would like to identify a technical POC from our team that will interact with them. This interaction would hopefully generate answers to a few questions.
I'm probably the one to handle this. @dpsnowden: can you make the appropriate introductions?
Hello @dpsnowden, @kerfoot
I'm writing a code to convert our files into the IOOS_Glider_NetCDF_v2.0. The documentation provided is very good and at this point I would like to make some test to verify that my conversion is accurate and is working as expected. I wonder if you could send me an example of a real glider nc file in the IOOS format? The example in this site is very useful but the variables are empty, a real file with data would be good for tests.
@fbringas : There are a couple of examples here:
https://github.com/ioos/ioosngdac/tree/master/nc/examples/profile
Would you like me to provide more?
@kerfoot : Thank you for the examples. The issue I'm trying to test is related to the variables "_qc" (i.e. temperature_qc, conductivity_qc, ...). While in my original nc format these variables are declared as char, in the ioss 2.0 format they are declared as byte. Is it acceptable to declare these variables as char instead of byte? If not, would you have one more example where these variables contain actual values? In the 2 examples above they were all empty. By the way, it was my understanding than instead of leaving these "_qc" variables empty they should be set to '0'.
@fbringas according to cf-convention/CF-2#3, char shouldn't be used. Most QC fields I've seen are done as flags, which I'm pretty sure is best represented via the byte type, but I'm no expert here.
@daf: I agree. Char data types are used for strings and bytes are used for numbers. We're using numbers, so we're using bytes.
As for the contents of the _qc variables, they are empty as I haven't yet implemented the flagging system in the files I'm creating.
It needs to be a signed integer (QARTOD). Most published manuals on marine QA/QC have a very small set of flags and an 8-bit signed integer (Byte in netCDF) is sufficient. Whenever a QC flag is used there needs to be a metadata field that describes the flag values. Example
byte temperature_qc(time):
string qc_flags = "0=fail, 1=good, 2=suspect, 3=fill_value";
The DAC 2.0 spec provides a set of flags for these, as an attribute. For example, line 291 here:
https://github.com/ioos/ioosngdac/blob/master/nc/template/IOOS_Glider_NetCDF_v2.0.ncml
I believe we took these from the IMOS specification, though I'm not particularly happy with them as they're very ambiguous and don't relate specifically to the QC check performed. If QARTOD has defined a set of standard QC flags, I'm all for using those.
flag | description |
---|---|
Pass=1 | Data have passed critical real-time quality control tests and are deemed adequate for use as preliminary data. |
Not evaluated=2 | Data have not been QC-tested, or the information on quality is not available. |
Suspect or Of High Interest=3 | Data are considered to be either suspect or of high interest to data providers and users. They are flagged suspect to draw further attention to them by operators. |
fail=4 | Data are considered to have failed one or more critical real-time QC checks. If they are disseminated at all, it should be readily apparent that they are not of acceptable quality. |
missing=9 | Data are missing; used as a placeholder. |
UPDATE: On our call this morning, we thought AOML data link was complete, but @robragsdale is still not able to register it without a link to the data. @lukecampbell please provide the access point/link for the data. @kknee, For awareness.
@robragsdale the link (http://50.17.63.70/erddap/tabledap/SG61020140715T1400.html) was passed around on the IOOS Glider email list, but wanted to document it here too.
Does it make sense to register with this temporarily until we have either (1) a domain for the IP or (2) have completed the WAF?
@kknee EMMA cannot harvest from a .html url. I got a 500 error back when I tried to change extension to xml. Could I use this URL ihttp://50.17.63.70/erddap/metadata/iso19115/xml/SG61020140715T1400_iso19115.xml from the ERDDAP Catalog. Thoughts?
@lukecampbell better to use the URL that @robragsdale suggests or http://50.17.63.70/erddap/tabledap/unit_23620121005T2349.iso19115?
What is keeping us from deciding on a domain name?
@dpsnowden I don't think anything is. On yesterday's call we discussed using the following URLs - @BeckyBaltes was going to confirm with you that these were okay and next steps for getting Luke access for assigning those domains to the DAC IP.
data.ioos.us/thredds/gliders data.ioos.us/erddap/gliders
@dpsnowden Just need you to provide Luke whatever logins/accesses he needs to build the two domains.
Sure. Let's talk Thursday or Friday.
On Wednesday, November 12, 2014, BeckyBaltes notifications@github.com wrote:
@dpsnowden https://github.com/dpsnowden Just need you to provide Luke whatever logins/accesses he needs to build the two domains.
— Reply to this email directly or view it on GitHub https://github.com/ioos/ioosngdac/issues/6#issuecomment-62805466.
Excuse my brevity, Sent from Gmail Mobile.
AOML Glider files submitted for registration (SG61020140715T1400 and SG60920140719T1700) are in the IOOS Catalog and Glider DAC v2.0 ERDDAP Service
Derrick sent out an email to the data group to figure out what needs to happen to get the data from the AOML gliders operating in the Carribean into the DAC. My understanding was that John needed to review the file format. @kerfoot, can you report on the status? If done, what are the next steps?