nanoos-pnw / NCEI-archiving

Code, documentation and issue tracking for NANOOS NCEI archiving
Apache License 2.0
1 stars 1 forks source link

Operational archiving errors #7

Open MathewBiddle opened 7 years ago

MathewBiddle commented 7 years ago

Hi All, I figured I would start up a new thread for errors encountered in the operational archive processing. Currently, the processing has successfully ran and a few mappings have been added. There are no errors for NANOOS to address at this time.

MathewBiddle commented 7 years ago

Just to let you know, I had a mistake in my metadata mapping. Instead of including the project ID 723 (Science and Technology University Research Network (SATURN) Collaboratory) I accidentally used project ID 23 (COORDINATED EASTERN ARCTIC EXPERIMENT (CEAREX)).

I have since gone into the package metadata and updated all of the associated references. I have also updated the mapping in our ingest system so this error wont occur again. I apologize for this inconvenience.

emiliom commented 7 years ago

A follow-up / reminder from our call last Thursday, for Matt:

MathewBiddle commented 7 years ago

I'm working on those updates now. Should be done by the end of the day.

I've added the citation composition to the ingest procedures. It should be implemented soon.

MathewBiddle commented 7 years ago

Okay here is the list of differences, the left hand column is from ftp://ftp.nodc.noaa.gov/pub/data.nodc/ioos/nanoos/ and the right hand column is from http://data.nanoos.org/ncei/ohsucmop/. The > symbol indicates that NANOOS has that folder, but NCEI does not. The < symbol indicates that NCEI has that folder but NANOOS does not. I will be updating the packages with < (saturn01, saturn10, seahs, and sveni) to indicate that the data is preliminary. This will be accomplished through a minor revision and indicated in the about/journal.txt

NCEI           NANOOS
abpoa           abpoa
am169           am169
cbnc3           cbnc3
chnke           chnke
coaof           coaof
            >   coaww
dsdma           dsdma
eliot           eliot
grays           grays
hmndb           hmndb
jetta           jetta
            >   lght2
            >   lght6
            >   lwsck
marsh           marsh
ncbn1           ncbn1
ogi01           ogi01
red26           red26
riverrad        riverrad
sandi           sandi
saturn01    <
saturn02        saturn02
            >   saturn05
saturn07        saturn07
            >   saturn08
saturn09        saturn09
saturn10    <
seahs       <
sveni       <
tansy           tansy
tnslh           tnslh
woody           woody
            >   yacht
            >   yb101
emiliom commented 7 years ago

Thanks.

MathewBiddle commented 7 years ago

for the citation. We will be concatenating any newly found names to the list of authors we've compiled for the AIP. How will we know what order to organize the citation in? Right now it uses the order you provide in the contributor_name attribute, then adds any new names to the end of the list of authors.

MathewBiddle commented 7 years ago

For now, we will append the new names to the citation list. If you would like something different I can investigate further.

emiliom commented 7 years ago

For now, we will append the new names to the citation list.

I don't think we came up with anything more sophisticated than this at our call last week. @cseaton, do you remember something different?

MathewBiddle commented 7 years ago

The four packages have gone through minor-revisions to update the journal.txt, please see the links below. saturn01: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0106/0162182/1.2/about/journal.txt saturn10: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0106/0162186/1.2/about/journal.txt seahs: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0106/0162187/1.2/about/journal.txt sveni: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0106/0162188/1.2/about/journal.txt

We will be processing the six additional packages soon. We have to do a little clean up with the ingest on our end with regards to the citation and some folder management. Expect to see some notifications soon.

cseaton commented 7 years ago

We didn't come up with anything different. I think it is difficult without a real world example to decide what the correct handling method is.

Although it is stable now, thinking through how the changes in PI leadership on NH-10 would properly have been handled might be a useful real world example.

----- Original Message ----- | From: "Emilio Mayorga" notifications@github.com | To: "nanoos-pnw/NCEI-archiving" NCEI-archiving@noreply.github.com | Cc: "cseaton" cseaton@stccmop.org, "Mention" mention@noreply.github.com | Sent: Tuesday, April 4, 2017 10:00:01 AM | Subject: Re: [nanoos-pnw/NCEI-archiving] Operational archiving errors (#7)

> For now, we will append the new names to the citation list.
I don't think we came up with anything more sophisticated than this at our call
last week. @cseaton, do you remember something different?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/nanoos-pnw/NCEI-archiving/issues/7#issuecomment-291565008
emiliom commented 7 years ago

Thanks for the update, @mbiddle-nodc.

We didn't come up with anything different. I think it is difficult without a real world example to decide what the correct handling method is.

Agreed. The simple approach Matt described seems ok until we face a real-world situation that suggests otherwise.

Although it is stable now, thinking through how the changes in PI leadership on NH-10 would properly have been handled might be a useful real world example.

Agreed! More reason to pursue NH-10 next (Matt: that's a NANOOS mooring managed by Oregon State Univ. that has been around for 10+ years and has regular biannual changes in deployment configurations; Craig Risien would take the role Charles has played). The original PI passed away a couple of years ago.

MathewBiddle commented 7 years ago

As for the remaining package which, we thought, should have been archived:
coaww, lght2, lght6, lwsck, yacht and yb101 They are not valid bags. They do not have the required bagit.txt, tagmanifest-sha256.txt, etc. Thus, this is why we never archived them.

MathewBiddle commented 7 years ago

Now that I look at it, none of those folder have data files either.

emiliom commented 7 years ago

Yikes! Sorry about that, Matt.

Charles, that problem (missing files) is on your end. At least for the two stations (yacht & yb101) I checked on your web site.

cseaton commented 7 years ago

I'll clean up the transfer process to ensure that empty directories don't show up in the future.

----- Original Message ----- | From: "Emilio Mayorga" notifications@github.com | To: "nanoos-pnw/NCEI-archiving" NCEI-archiving@noreply.github.com | Cc: "cseaton" cseaton@stccmop.org, "Mention" mention@noreply.github.com | Sent: Wednesday, April 5, 2017 1:30:40 PM | Subject: Re: [nanoos-pnw/NCEI-archiving] Operational archiving errors (#7)

Yikes! Sorry about that, Matt.
Charles, that problem (missing files) is on your end. At least for the two
stations (yacht & yb101) I checked on your web site.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/nanoos-pnw/NCEI-archiving/issues/7#issuecomment-291987505
emiliom commented 7 years ago

Thanks.

I assume these directories / stations were left empty by mistake, and you will be generating the missing files soon?

MathewBiddle commented 7 years ago

Its fine if they are there and empty, just remember that we wont archive them until they meet the requirements.

emiliom commented 7 years ago

@cseaton, any update on when you expect to populate the stations missing (empty folders) from the original batch?

cseaton commented 7 years ago

The data for the missing stations is not stored in the same manner as all the other data, so it will take a while to get that data accessible. The missing stations are older historical stations, so were not considered a top priority for this project (either from a CMOP point of view or NANOOS point of view).

----- Original Message ----- | From: "Emilio Mayorga" notifications@github.com | To: "nanoos-pnw/NCEI-archiving" NCEI-archiving@noreply.github.com | Cc: "cseaton" cseaton@stccmop.org, "Mention" mention@noreply.github.com | Sent: Monday, April 10, 2017 2:25:41 PM | Subject: Re: [nanoos-pnw/NCEI-archiving] Operational archiving errors (#7)

@cseaton, any update on when you expect to populate the stations missing (empty folders) from the original batch?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/nanoos-pnw/NCEI-archiving/issues/7#issuecomment-293083793
emiliom commented 7 years ago

Ah, ok. I've been assuming that it was always your intent to create archives for those missing stations in this initial batch, but a simple glitch had prevented the files from being generated. Fine with me to set these aside for now, then. Which brings us to the other remaining stations, all of which are saturn stations if I remember correctly. Let's discuss those initially via email.

emiliom commented 7 years ago

Charles, for the next batch of CMOP stations to process and archive, I suggest we focus on the ones that NCEI "archived" by mistake and are currently publicly available from NCEI. It's these 4: saturn01, saturn10, seahs, sveni. Focusing on these would minimize the time period when there's potentially bad or misleading data available on the NCEI archive.

emiliom commented 7 years ago

https://data.nodc.noaa.gov/ioos/nanoos

emiliom commented 7 years ago

Here's the plan:

Charles, please add comments if I missed or mischaracterized something.

@mbiddle-nodc, I'll ping you when the new submissions are ready, just as an FYI. This should be by Friday noon (Pacific) for sure, barring any surprises.

emiliom commented 7 years ago

@mbiddle-nodc, the next batch of cmop/nanoos station files are ready for archiving. I know I don't actually need to tell you, since the automated NCEI system will pick them up on the 15th. But it doesn't hurt to be explicit at this early stage.

FYI, it's just 7 stations (saturn03, saturn04, saturn05, saturn08, saturn10, seahs, sveni), but in terms of total file size they're much larger than the first batch archived in January.

MathewBiddle commented 7 years ago

Okay, we took a look. I do have one concern, whats up with ohsucmop/sveni/sveni? There's an extra directory in the hierarchy. The extra directory will be preserved, I hope that's alright.

emiliom commented 7 years ago

Okay, we took a look. I do have one concern, whats up with ohsucmop/sveni/sveni? There's an extra directory in the hierarchy. The extra directory will be preserved, I hope that's alright.

Darn, that was my mistake! It was not intentional. sveni had been overlooked on Charles' end, then he made it available manually and I brought in manually w/o good checking.

I assume it's too late to change it to remove the duplicated folder hierarchy? If it is, oh well.

MathewBiddle commented 7 years ago

I'll check.

MathewBiddle commented 7 years ago

Right now, since the package doesn't meet the requirements, it will not be archived. If you make the appropriate changes we will pick it up the next time it runs.

emiliom commented 7 years ago

Thanks, @mbiddle-nodc. I guess that was a blessing in disguise. I've fixed the sveni folder hierarchy just now. No problem if it gets picked up next time instead.