ioos / ckanext-ioos-theme

IOOS Catalog as a CKAN extension
GNU Affero General Public License v3.0
7 stars 14 forks source link

Why is G1_SST_GLOBAL so stale? #51

Closed rsignell-usgs closed 8 years ago

rsignell-usgs commented 8 years ago

We were wondering why we were not finding G1_SST_GLOBAL in our catalog-driven workflow, using https://data.ioos.gov/csw as an endpoint.

Looks like the metadata in the Axiom/cencoos WAF: http://thredds.axiomdatascience.com/iso/cencoos/G1_SST_GLOBAL.iso.xml has an up-to-date endPosition for time: <gml:endPosition>2016-08-22T00:00:00Z</gml:endPosition>

But the metadata in the IOOS WAF: https://data.ioos.us/waf/CeNCOOS/G1_SST_GLOBAL.iso.xml has an ancient endPosition for time: <gml:endPosition>2015-08-16T00:00:00Z</gml:endPosition>

Shouldn't this be updating daily?

rsignell-usgs commented 8 years ago

Okay, I guess there is a dev catalog, and it's not using data.ioos.us.waf.

I found that the IOOS dev catalog metadata record here: https://dev-catalog.ioos.us/dataset/g1sst-1km-blended-sst2/resource/0e9bd67e-597e-4672-ba4c-7631f30c83b2 has temporal-extent-end = 2016-08-20T00:00:00Z but if I follow the THREDDS ISO link for this dataset http://thredds.cencoos.org/thredds/iso/G1_SST_GLOBAL.nc I get: <gml:endPosition>2016-08-23T00:00:00Z</gml:endPosition>

Ideally they would be in sync most of the day, and worst case 1 day off, no?

Or could the worst case schedule be something like: Monday, 0400 Daily update to G1SST dataset at CENCOOS Tues, 0300 Daily harvesting of metadat at CENCOOS Wed, 0200 Harvesting of metadata from CENCOOS to IOOS.US Thur, 0100 Refresh of CKAN/pyCSW database

lukecampbell commented 8 years ago

The production catalog https://data.ioos.us/ is a snapshot in time from the NGDC Geoserver around February 2016. So nothing should be terribly new on that machine.

The development instance https://dev-catalog.ioos.us/ uses the registry (a product you'll soon be introduced to) and synchronizes with the registry daily at this point but we're working on increasing the frequency.

The problem with us increasing the frequency of harvests is that we receive complaints from data providers about the requests. To me, this is strange in that we're only downloading XML documents from a web-hosted index. It's the most basic form of HTTP that I'm aware of. And the XML documents aren't HUGE, they're not incredibly small but synchronizing a WAF should not cause any burden for a server, in my opinion.

But still, to reduce the load, we only download the XML documents from data providers daily. Using the registry, however, you can synchronize any time.

lukecampbell commented 8 years ago

I just looked into it, it looks like the registry stopped on the 22nd for some reason. I'll take a look.

rsignell-usgs commented 8 years ago

Ah, and the ISO metadata for this are here: https://registry.ioos.us/waf/