NANOOS / CMOP ATRAC refinement

emiliom commented 7 years ago

For @emiliom, mostly: review and edit as needed the ATRAC project entry Matt created for CMOP submission:

Project Title: Physical and biological data collected from buoys and moorings in the Columbia River Estuary and nearby coastal ocean from OHSU and CMOP, compiled by NANOOS.
Data Theme: Oceans and Coasts
Submitted On: 2016-12-01

emiliom commented 7 years ago

From @mbiddle-nodc, Dec 6, regarding possibilities for RA-level collection metadata and mechanisms for discovering or pointing to all metadata records tagged to or under an RA:

Right now there will be a metadata record for each Archival Information Package (AIP), which will be established by station. For the regions that we are currently archiving (SCCOOS http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:0157016 for example) we are not providing collection level records, only AIP level records. But, we do plan on generating collection level records, it just hasn't been addressed yet. Something like a region collection level record that would reference to all the associated archived data for that region.

FYI, there are piece-meal approaches we can use for the time being. For example, in the metadata record http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:0157016, under the 'Keywords' tab you will see Southern California Coastal Ocean Observing System. Which takes you to the institution record for SCCOOS. Clicking on "Find NODC accessions submitted by SCCOOS" takes you to the Ocean Archive System response for packages submitted by SCCOOS. I know its a little wonky, but it works for now.

[Also] We created this portal to help with discovery of archived data https://www.nodc.noaa.gov/ioos/

Thanks, Matt! Basically a user need I was getting at was having single a link where we could point users to all NANOOS-associated metadata records. This information definitely addresses my question and provides some good options.

emiliom commented 7 years ago

I have a question about this: "there will be a metadata record for each Archival Information Package (AIP), which will be established by station". I was assuming the ATRAC Project metadata record at https://www.ncdc.noaa.gov/atrac/projectdetails.html?id=6901 was going to remain as a parent collection metadata for all the station records. Is that not the case? If so, what happens to the ISO Metadata record associated with it?? Does it get spliced into station-level metadata records?

MathewBiddle commented 7 years ago

Good question. We haven't figured out where the ISO record from the ATRAC system would fit into our discovery paradigm. We can use it to develop a collection level record for the data archived through the project. But, if we create a collection level record and have all the AIP records under it, then discovery would be slightly different then not having the collection level record. We should discuss over the phone to see what you want to do.

MathewBiddle commented 7 years ago

Another thing to mention about access. You can use the geoportal to establish a query, then use the REST API as a static url that would point to the geoportal response for all the datasets matching the query.

For example: I want to find SCCOOS Fixed station data. I got to the geoportal http://data.nodc.noaa.gov/geoportal/catalog/search/search.page I type in

"SCCOOS" AND "Fixed"

into the search box I get the expected responses I use any of the REST API at the bottom of the page as the static url for my query (depending on which service you want to use)

As a note If you go to the html response, for example http://data.nodc.noaa.gov/geoportal/rest/find/document?searchText=%22SCCOOS%22%20AND%20%22Fixed%22&start=1&max=25&contentOption=intersecting&f=html&dojo.preventCache=1481119667366 You can edit the url to have the link go to the search page with the matching query: http://data.nodc.noaa.gov/geoportal/rest/find/document?searchText=%22SCCOOS%22%20AND%20%22Fixed%22&start=1&max=250&contentOption=intersecting&f=searchPage I adjusted f=html to f=searchPage I removed &dojo.preventCache=1481119667366 I also increased the amount of records to display max=25 to max=250

That is just another option for the single link...

I think these questions will be answered once we are archiving the data and you have physical packages and metadata records to look at.

emiliom commented 7 years ago

Linking to a relevant and very helpful comment from @mbiddle-nodc, about metadata in ATRAC Project vs ACDD in nc files vs bag-info.txt

emiliom commented 7 years ago

Good question. We haven't figured out where the ISO record from the ATRAC system would fit into our discovery paradigm. We can use it to develop a collection level record for the data archived through the project. But, if we create a collection level record and have all the AIP records under it, then discovery would be slightly different then not having the collection level record. We should discuss over the phone to see what you want to do.

I'm still figuring out what makes sense, as I get a better sense of all the moving parts! I didn't realize that the ISO record being created through ATRAC was more for internal tracking of progress on the project and not necessarily for publishing; or, as you said elsewhere, "we simply use it as a way to collect more information about the archival process we are planning on establishing". I think that's fine.

MathewBiddle commented 7 years ago

Just in case someone stumbles on this thread. The recommendations I'm providing here are primarily for the NANOOS archival process at NCEI. While some of the information might be useful and applicable to other data sets, these are not blanket statements for all of NCEI's archival procedures.

emiliom commented 7 years ago

@mbiddle-nodc, I think this is an ATRAC issue, but maybe not. Here's another conclusion from my call with @cseaton:

This week over email you and I discussed a citation template and minting of DOI's.

The citation attribute was created from a template I use. It follows this template: [DMAC lead], [Regional Association] [YYYY] Feel free to adjust that citation to be more appropriate for this data set.

This week you also clarified that the ATRAC project record is not intended to be a public metadata record per se, but instead the public metadata records will be the AIP's (stations and possibly finer granules). Given this, we'd like to use a citation template that is station-specific and should be automatically constructed from the ACDD contributor_name entries (in the order included), plus my name as the last one. The total length of co-authors will then be 5-6 in these CMOP-NANOOS AIP's.

I assume that there will be other elements of the citation in addition to [AUTHORS][Regional Association] [YYYY], such as AIP title?

Anyway, I'm also assuming this doesn't impact the assessment of the final test data file submissions, and we can discuss in January?

Another thing I should mention, if you would like DOIs minted for the data, we would potentially use this citation as the author list (which is a requirement for DOIs).

Yes, like I said, we'll want DOI's. But we can discuss this in January, assuming there's nothing critical to decide next week -- or this month! -- that involves this issue.

MathewBiddle commented 7 years ago

Have a look at https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:0157016 on the right hand side there is a data set citation dropdown (this is what gets automatically generated for an archival information package), here's what it contains:

Cite as: Wright, D. and Southern California Coastal Ocean Observing System (2016). Oceanographic data collected from station Santa Monica Pier in the Coastal Waters of California by Institute of the Environment at University of California, Los Angeles, and assembled by Southern California Coastal Ocean Observing System (SCCOOS) Regional Association from 2005-06-16 to 2015-07-13 (NCEI Accession 0157016). Version 1.1. NOAA National Centers for Environmental Information. Dataset. [access date]

I think the structure is: [Submitter], and [Submitting Institution] ([Published Date]). [AIP Title]. Version N.N. NOAA National Centers for Environmental Information. Dataset. [access date]

Ill add the request to include the contributors into the citation. I don't think that will be a problem.

As for DOI minting, this will be a longer discussion and will most likely involve NCEI management. We can move forward and come back to that once the process is in place.

emiliom commented 7 years ago

Perfect on all counts, thanks. We'll reconnect on DOI minting when the timing is appropriate.

emiliom commented 7 years ago

For future reference, I'm adding here a note about the remaining task of adding a documentation-only archive object/project. I assume it's ok to wait till January to get to this, unless you two want to make progress this month in my absence. I suspect Charles will need time off to get back to work he's had to put in the backburner ...

MathewBiddle commented 7 years ago

Another thing for the backburner, when we get to the kiviuq kayak dataset. It would be nice to include more instrument level metadata in the instrument variable. What you have now is more platform information. E.g. What is in the multi-instrument package?

Remember, you can include multiple instrument variables to reference to throughout the dataset.

    int instrument ;
        instrument:calibration_date = "1970-1-1 0:00" ;
        instrument:calibration_report = "calibration date is a dummy value for testing purposes" ;
        instrument:comment = "" ;
        instrument:factory_calibrated = "" ;
        instrument:long_name = "Kayak mounted multi-instrument package" ;
        instrument:make_model = "None/None" ;
        instrument:platform = "platform" ;
        instrument:serial_number = "kiviuq_1" ;

MathewBiddle commented 7 years ago

In saturn10/data/saturn10.-300.F.MET_raw/201510-2134.nc, check out airtemp_qc:notes =, that is quite the comment. :/ I'm a little concerned about how effective that comment might be to someone who wants to use the data.

Another point of interest, I see a lot of these notes attributes. Why didn't that information get put into the comment attribute? It would be nice not to invent attributes when applicable attributes already exist in the current conventions.

emiliom commented 7 years ago

@mbiddle-nodc, just wanted to clarify that we haven't updated yet the files for NCEI. I'm starting the process to grab from the cmop server just now, and it should take an hour or less. I'll let you know when it's done.

Thanks for the additional comments.

MathewBiddle commented 7 years ago

Correct, the comments I've provided are solely based on the packages bagged on 2016-11-30.

cseaton commented 7 years ago

RE: 'notes' attribute on _qc variables

On the second point, you are right that these should be 'comments' rather than 'notes'. Easy to change going forward. The _qc variables containing flagging information are the only place the 'notes' attribute appears.

In terms of the content of the attribute, the intend is to provide a description of the reason that data was flagged in qc, for each period that data was flagged. I agree that in the case of automated flags repeatedly applied, this generates an unreasonably large and repetitive text.

To bring this in line with standards for flagging data, http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch03s05.html I think a better approach would be for me to generate a boolean flag ('flag_mask') variable with each bit representing a reason that data was flagged. The largest number of reasons in a single month is <20 (usually < 5). This represents an increase in file size, but I haven't tested how much of one yet.

Charles

----- Original Message ----- | From: "mbiddle-nodc" notifications@github.com | To: "nanoos-pnw/NCEI-archiving" NCEI-archiving@noreply.github.com | Cc: "cseaton" cseaton@stccmop.org, "Mention" mention@noreply.github.com | Sent: Wednesday, December 14, 2016 7:44:11 AM | Subject: Re: [nanoos-pnw/NCEI-archiving] NANOOS / CMOP ATRAC refinement (#2)

In `saturn10/data/saturn10.-300.F.MET_raw/201510-2134.nc`, check out	`airtemp_qc:notes =`, that is quite the comment. :/ I'm a little concerned	about how effective that comment might be to someone who wants to use the data.
Another point of interest, I see a lot of these `notes` attributes. Why didn't
that information get put into the `comment` attribute? It would be nice not to
invent attributes when applicable attributes already exist in the current
conventions.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/nanoos-pnw/NCEI-archiving/issues/2#issuecomment-267067925

nanoos-pnw / NCEI-archiving

NANOOS / CMOP ATRAC refinement #2