ioos / catalog

IOOS Catalog general repo for documentation and issues
https://ioos.github.io/catalog/
MIT License
2 stars 6 forks source link

Lag in IOOS Catalog OGC CSW endpoint temporal extent (stale CSW records?) #66

Closed emiliom closed 5 years ago

emiliom commented 6 years ago

I think the CSW records have been getting stale in the last two weeks or so. A CSW query with a temporal filter (say, last 5 days) that used to work, no longer returns records (or barely any) unless the length of time in the filter is expanded by a few days; and the length of that expansion has been growing as time goes by. I've illustrated this in this notebook: https://gist.github.com/emiliom/ea4ae7e6fe051d8cac2788015e60e0cf#file-iooscatalog_csw_oldtemporalextent-ipynb

The issue illustrated there can not be the result of a single web service getting old, because the records come from multiple web services and providers. The only logical explanation I can think of is that the CSW records (or at least their temporal extent) are stale by about two weeks (based on other tests).

Hopefully this is something that's very easy to fix, b/c it'd be great if it could be addressed by Tuesday or so for my IOOS data access tutorial at https://oceanhackweek.github.io

I pointed out this problem over a week ago at https://github.com/emiliom/ohw2018_tutorials/issues/2, though at the time it didn't seem as clearcut as it does now.

benjwadams commented 6 years ago

@emiliom , I addressed some issues with the harvester for catalog last week. I just looked at the CKAN-PyCSW sync code and it was encountered an error upon hitting a malformed record, which I've since removed. I just re-ran the sync and it completed, so please see if the data is updated. I'm also trying to run your notebook but have to compile a few dependencies.

emiliom commented 6 years ago

YES!! I re-ran my notebook and the lag seems to be gone now. Awesome! Thanks so much.

I can close the issue if you'd like, or let you close it.

benjwadams commented 6 years ago

Glad I could help! I'll close it out.

emiliom commented 6 years ago

This problem is back.

Running the https://gist.github.com/emiliom/ea4ae7e6fe051d8cac2788015e60e0cf#file-iooscatalog_csw_oldtemporalextent-ipynb (the gist I mentioned in my first comment on this issue) today returns only 1 record, for a temporal window of the last 5 days. Expanding it to 10 days results in 24 records, which is closer to what I expect based on running this query many times over the last few weeks.

benjwadams commented 6 years ago

CSW sync job had some issues that needed fixing. Those have been addressed now and I also have a job which should alert me when the metadata modification time of the CSW records drifts too far away from the CKAN records (presently set to 1.5 days.)

benjwadams commented 5 years ago

I ran the notebook with minimal modifications (I could not run ioos_tools due to a dependency on iris, which did not appear to have a Python 3.6 package for pip to install). The record count is up to 47 records in the last 5 days. I also am now monitoring the metadata modification times of the CKAN datasets compared with the CSW records harvested. If there's too large a discrepancy, I receive alerts. I'm also monitoring harvests on the CKAN end and they are completing daily. I'm going to close as the results seem up to date with respect to their counterparts in CKAN, short of inspecting all the possible metadata records.