Rethinking COMT THREDDS catalog generation via Google Doc

rsignell-usgs commented 8 years ago

The current strategy for creating the COMT Summary THREDDS catalog, like this one: http://comt.sura.org/thredds/comt_2_current.html is the process described here: https://github.com/ioos/comt_catalog/blob/master/README.md

Basically, folks upload a pile of files, create an NcML that aggregates and provides extra metadata, and then create a new record (new line) in this Google Spreadsheet: https://docs.google.com/spreadsheets/d/14OnF_K3TyhtdgmjxbP69IdKsxbUbw2zBOwSwhcP_SEU/edit#gid=7 that points to the NcML. A cron job then runs automatically each hour, reading the google spreadsheet and writing new THREDDS catalogs.

There are a couple of problems with this approach:

Reading the google doc. I found out this morning that the catalog has not updated since April 20, 2015, when Google started requiring Oauth2 authentication instead of email/password authentication: https://developers.google.com/identity/protocols/AuthForInstalledApps?csw=1. Luckily thanks to the instructions here: https://github.com/burnash/gspread/issues/224#issuecomment-95626930, I was able to modify the COMT1 catalog generation script and it works again.
Actually having people follow this approach! Before I modify the script that generates the comt_2 catalog, I see that on June 25, the catalog /home/testbed/comt_catalog/catalogs/comt_2_current.xml was manually edited and a catalog ref inserted:
```
<catalogRef xlink:href="http://oceanmodeling.pmc.ucsc.edu:8080/thredds/catalog/ccsnrt_physbio/fmrc/catalog.xml" xlink:title="CCSNRT Phys Bio Aggregation/Best Time Series" name=""/>
```
YIPES! If I update the COMT2 catalog generation script with OAUTH2 authentication, it will again run hourly and wipe out this catalog ref. Was this just an experiment?

Perhaps we should think about taking a different approach in light of these issues.

We could instead take the approach we are using for our model datasets here at USGS Coastal and Marine Geology (CMG), which is to have users control what appears in our USGS CMG portal by just having them add "CMG_Portal" to the "project" attribute.

Then we crawl thredds catalogs for NcML files, creating ISO metadata that we harvest into pycsw (but for testbed we are using NGDC geoportal still, right?)

Then when we do our CSW query, we just look for data with "project=CMG_Portal" and add those to the model viewer portal.

We could do the same for COMT, just using "COMT_Portal" or similar.

@brianmckenna, @kknee, what do you think?

Perhaps we should have a quick telecon about this...

rsignell-usgs commented 8 years ago

I wanted to bring Eoin in here, but I couldn't remember (or find) his github handle. @kknee , can you bring him in?

BeckyBaltes commented 8 years ago

I don't have a strong opinion on this, but it does seem like a revision to the process is due. I'm interested to hear what you come up with.

kknee commented 8 years ago

@rsignell-usgs I think Eoin is @ehowlett, though I would be shocked if he chimes in.

Can I suggest that you folks discuss this in person next week? Assuming you'll be at the annual meeting too?

rsignell-usgs commented 8 years ago

C'mon @ehowlett, shock us! ;-)

ioos / comt_catalog

Rethinking COMT THREDDS catalog generation via Google Doc #40