Closed mwengren closed 4 years ago
security breach on gcoos5 and we had to re-initialize the server and rebuild. We have a complete backup of the data and we should be back very soon (working on it as I write this). Sorry for the inconvenience.
The ERDDAP server for GCOOS is still down, we are working on getting it back up and running.
@fgayanilo @leilabbb We're still seeing ~6000 records in the IOOS Harvest Registry for this GCOOS WAF.
Is there an issue with the code that's generating these files? I think this is going to present a problem for the IOOS Harvest Registry (https://registry.ioos.us/) to read all of these files. I can see that there's a harvest job running at present that isn't able to complete.
Can you re-investigate those processes? Are there actually 6000 XML files that are supposed to be getting harvested by the IOOS Registry/Catalog?
GCOOS' current dataset count in the Catalog is ~800, which are probably coming from the other two sources:
@leilabbb is still working on getting our primary ERDDAP server for oceanographic and atmospheric data up and running. It should have about 5K of data in it. The ISO contains 475 records and gcoos4 ERDDAP (for biological records) has 304.
Ok, please keep us updated and keep in mind whether this WAF URL registered in the Harvest Registry is going to be the right approach to use:
http://gcoos5.geos.tamu.edu:6060/erddap/metadata/iso19115/xml/
I can't even get a directory listing from it when I try to browse there, so it's unlikely the Registry will be able to harvest those records. You may need to use a secondary, more powerful web server to serve those.
@mwengren the gcoos5 alternate server (http://erddap.gcoos.org:8080/erddap) is up and ready to serve but it looks like the registry.ioos.us is returning 504.
Ok, pinging @benjwadams on the Registry status.
Server was having issues, had to restart. It looks like we're now harvesting from the contents of the ERDDAP server. Closing.
I want to keep this open for monitoring purposes. We're at 26,000 datasets in the Catalog now and counting. It looks like the sheer number of datasets in this ERDDAP is going to stretch our infrastructure a bit for Catalog. @fgayanilo what will be the total count of datasets you plan to serve in this ERDDAP instance? We're well above the average RA already.
I didn't look through them all of course, but I take it there's no feasible way to aggregate some of these? A lot look to be historical, at least on the first page.
That number is big and wrong @mwengren. The files should only be 7K++. We were waiting until the registry returns so we can update. Just checked and it seems to be working again. I changed the URL to the correct ERDDAP instance and initiated reharvest (see above, erddap.gcoos.org/erddap. Our old ERDDAP returned to service but it includes all other junks that should not be there. We are still in the process of syncing the instances, but the new server should be correct.
Ok, we may have to do a manual clear of that GCOOS WAF. Catalog harvesting seems to be 'stuck' at the moment. @fgayanilo and @benjwadams can you work directly on harvesting the correct GCOOS WAF and clearing hung harvesting jobs/manually clearing the GCOOS harvest if necessary.
Data from the current GCOOS WAF CKAN harvest (currently running) below. CKAN still shows ~13,000 datasets, so it may not be clearing out all the former records properly.
https://data.ioos.us/harvest/gcoos-waf
Id | c6ca030e-e120-41c3-abfa-3fe84c1a8537 |
---|---|
Created | March 23, 2020, 2:01 AM (UTC-04:00) |
Started | March 23, 2020, 2:01 AM (UTC-04:00) |
Finished | |
Status | Running |
@benjwadams let me know what I can do from my end.
@fgayanilo This appears to have begun harvesting again. I'm not entirely sure what occurred, but it looks like the metadata is getting to the catalog now.
@benjwadams that's great!
Going to close this out since the harvest appears to be working again.
@fgayanilo We're having some harvesting problems with GCOOS' WAF(s).
I think the primary issue is with this one: http://gcoos5.geos.tamu.edu:6060/erddap/metadata/iso19115/xml/, which now says it has ~ 6000 records in the Harvest Registry.
Second, I can't connect to this one from my location, which means the harvesting scripts likely can't either: http://gcoos4.tamu.edu:8080/erddap/metadata/iso19115/xml/.
Can you see about setting up port forwarding to a web server, and also see what's up with the 6000 records?
GCOOS now has the most records in the Catalog by far: https://data.ioos.us/dataset?_gcmd_keywords_limit=0&organization=gcoos.