ioos / ioosngdac

IOOS National Glider Data Assembly Center (V2)
https://ioos.github.io/ioosngdac/
8 stars 18 forks source link

AOML Glider Data delivery to DAC #6

Closed BeckyBaltes closed 9 years ago

BeckyBaltes commented 10 years ago

Derrick sent out an email to the data group to figure out what needs to happen to get the data from the AOML gliders operating in the Carribean into the DAC. My understanding was that John needed to review the file format. @kerfoot, can you report on the status? If done, what are the next steps?

kerfoot commented 10 years ago

@BeckyBaltes @dpsnowden These files contain significantly more information/data than the current DAC standard requires. Here are some first thoughts:

  1. The file uses the trajectory data type. DAC 2.0 uses a single profile data type with a time dimension.
  2. The file contains a single dive, which is 2 profiles, so 2 files would need to be written for each AOML file.
  3. qc flag values are not those required by the DAC.
  4. AOML files have 30 dimensions. DAC requires only 2: time and trajectory string length.
  5. variable names are different
  6. AOML file is missing the instrument_ctd container variable
  7. AOML files don't have a global:wmo_id and do not have a platform:wmo_id attribute.

There are likely other differences, these are just the big ones. These differences alone would require significant rewriting of existing code or likely a decent time investment to write a converter.

kknee commented 10 years ago

@kerfoot are you referring to a converter from the sample netcdf format to the expected DAC2.0 format?
What about having the AOML team follow the submission process documentation from the beginning and creating a netCDF file in the format we expect? Might be a good test of the system.

kerfoot commented 10 years ago

@kknee : I'd definitely prefer them follow the process, but I'm guessing they have already spent considerable time and resources to get it to the NODC format they are currently using. Not sure how excited they'll be to start over.

@dpsnowden : any feeling for this? The current DAC workplan and SOW doesn't have anything in there about writing convertors for various groups.

BeckyBaltes commented 10 years ago

Ideally, this becomes a repeatable process and it probably doesn't make sense to start writing converters for everyone, so I think it's fine to start them with the process and see what they can do. @dpsnowden pointed them to the wiki again this morning and to this thread so if they are on github they can weigh in and track progress.

dpsnowden commented 10 years ago

Agreed, we started this whole process with the assumption that the formatting would be left to the data providers to the extent possible. Let's see how far we can get with this. But, if the process proves impossible for various reasons then we will need to revisit our assumption and budget for it. If we can't get this data integrated inside of a finite window (1 month?) then we need to think about converters or other technical assistance. The pretty maps and tools in the DAC aren't useful if it isn't full of data.

The more help we can provide in terms of "change x to y in your netcdf file" the better.

Finally, @kerfoot mentioned that they have more metadata in their files than we currently require. I think we should think about adopting the policy that this situation is ok. If we all agreed that more metadata from the provider is better, then we don't want to discourage them from writing it. How would we address this? Can we have rigid standardization of some things and flexibility elsewhere?

kerfoot commented 10 years ago

@dpsnowden I think that, as long as they have the variables and attributes that we require, additional data would not be prohibitive and the DAC would be able to serve it. The trick would be setting up erddap datasets in which the underlying .nc file contents are different, depending upon who submitted the data, assuming they wanted all of their data to be accessible.

kknee commented 10 years ago

so as long as DAC 2.0 documentation is ready ( @kerfoot please confirm) then ball is in AOML's court on this issue?

kerfoot commented 10 years ago

@kknee The doco on the file format is ready. It's been reviewed by myself and Bob Simons. Since DAC 2.0 is not officially up, the doco on the file submission process is not completely up to date. But they'll need some time to get the files written before they need to worry about submission.

dpsnowden commented 10 years ago

Good news. I agree that AOML has a role to play here. But, I still would like to identify a technical POC from our team that will interact with them. This interaction would hopefully generate answers to a few questions.

  1. Are they willing/able to create a second version of their data files to comply with the DAC needs?
  2. Is our documentation clear enough that they can do that easily and without much hand holding?
  3. Is it possible to do the "trick" that @kerfoot mentioned above? I see that it might be theoretically possible but who is going to test to determine if it is possible?
  4. How much metadata is lost in migrating from the AOML format into the DAC format and do we care? Is there a way to recover.
kerfoot commented 10 years ago

I'm probably the one to handle this. @dpsnowden: can you make the appropriate introductions?

fbringas commented 10 years ago

Hello @dpsnowden, @kerfoot

I'm writing a code to convert our files into the IOOS_Glider_NetCDF_v2.0. The documentation provided is very good and at this point I would like to make some test to verify that my conversion is accurate and is working as expected. I wonder if you could send me an example of a real glider nc file in the IOOS format? The example in this site is very useful but the variables are empty, a real file with data would be good for tests.

kerfoot commented 10 years ago

@fbringas : There are a couple of examples here:

https://github.com/ioos/ioosngdac/tree/master/nc/examples/profile

Would you like me to provide more?

fbringas commented 10 years ago

@kerfoot : Thank you for the examples. The issue I'm trying to test is related to the variables "_qc" (i.e. temperature_qc, conductivity_qc, ...). While in my original nc format these variables are declared as char, in the ioss 2.0 format they are declared as byte. Is it acceptable to declare these variables as char instead of byte? If not, would you have one more example where these variables contain actual values? In the 2 examples above they were all empty. By the way, it was my understanding than instead of leaving these "_qc" variables empty they should be set to '0'.

daf commented 10 years ago

@fbringas according to cf-convention/CF-2#3, char shouldn't be used. Most QC fields I've seen are done as flags, which I'm pretty sure is best represented via the byte type, but I'm no expert here.

kerfoot commented 10 years ago

@daf: I agree. Char data types are used for strings and bytes are used for numbers. We're using numbers, so we're using bytes.

As for the contents of the _qc variables, they are empty as I haven't yet implemented the flagging system in the files I'm creating.

lukecampbell commented 10 years ago

It needs to be a signed integer (QARTOD). Most published manuals on marine QA/QC have a very small set of flags and an 8-bit signed integer (Byte in netCDF) is sufficient. Whenever a QC flag is used there needs to be a metadata field that describes the flag values. Example

byte temperature_qc(time):
    string qc_flags = "0=fail, 1=good, 2=suspect, 3=fill_value";
kerfoot commented 10 years ago

The DAC 2.0 spec provides a set of flags for these, as an attribute. For example, line 291 here:

https://github.com/ioos/ioosngdac/blob/master/nc/template/IOOS_Glider_NetCDF_v2.0.ncml

I believe we took these from the IMOS specification, though I'm not particularly happy with them as they're very ambiguous and don't relate specifically to the QC check performed. If QARTOD has defined a set of standard QC flags, I'm all for using those.

lukecampbell commented 10 years ago
flag description
Pass=1 Data have passed critical real-time quality control tests and are deemed adequate for use as preliminary data.
Not evaluated=2 Data have not been QC-tested, or the information on quality is not available.
Suspect or Of High Interest=3 Data are considered to be either suspect or of high interest to data providers and users. They are flagged suspect to draw further attention to them by operators.
fail=4 Data are considered to have failed one or more critical real-time QC checks. If they are disseminated at all, it should be readily apparent that they are not of acceptable quality.
missing=9 Data are missing; used as a placeholder.

From QARTOD Temperature Salinity Manual

BeckyBaltes commented 9 years ago

UPDATE: On our call this morning, we thought AOML data link was complete, but @robragsdale is still not able to register it without a link to the data. @lukecampbell please provide the access point/link for the data. @kknee, For awareness.

kknee commented 9 years ago

@robragsdale the link (http://50.17.63.70/erddap/tabledap/SG61020140715T1400.html) was passed around on the IOOS Glider email list, but wanted to document it here too.

Does it make sense to register with this temporarily until we have either (1) a domain for the IP or (2) have completed the WAF?

robragsdale commented 9 years ago

@kknee EMMA cannot harvest from a .html url. I got a 500 error back when I tried to change extension to xml. Could I use this URL ihttp://50.17.63.70/erddap/metadata/iso19115/xml/SG61020140715T1400_iso19115.xml from the ERDDAP Catalog. Thoughts?

kknee commented 9 years ago

@lukecampbell better to use the URL that @robragsdale suggests or http://50.17.63.70/erddap/tabledap/unit_23620121005T2349.iso19115?

dpsnowden commented 9 years ago

What is keeping us from deciding on a domain name?

kknee commented 9 years ago

@dpsnowden I don't think anything is. On yesterday's call we discussed using the following URLs - @BeckyBaltes was going to confirm with you that these were okay and next steps for getting Luke access for assigning those domains to the DAC IP.

data.ioos.us/thredds/gliders data.ioos.us/erddap/gliders

BeckyBaltes commented 9 years ago

@dpsnowden Just need you to provide Luke whatever logins/accesses he needs to build the two domains.

dpsnowden commented 9 years ago

Sure. Let's talk Thursday or Friday.

On Wednesday, November 12, 2014, BeckyBaltes notifications@github.com wrote:

@dpsnowden https://github.com/dpsnowden Just need you to provide Luke whatever logins/accesses he needs to build the two domains.

— Reply to this email directly or view it on GitHub https://github.com/ioos/ioosngdac/issues/6#issuecomment-62805466.

Excuse my brevity, Sent from Gmail Mobile.

robragsdale commented 9 years ago

AOML Glider files submitted for registration (SG61020140715T1400 and SG60920140719T1700) are in the IOOS Catalog and Glider DAC v2.0 ERDDAP Service