Closed benjwadams closed 9 years ago
Bingo. Increasing the redis queue timeout appears to do the trick for AOOS. Haven't tested against CeNCOOS but I suspect it's a similar story.
CeNCOOS is timing out for a slightly different reason: the network:all is extremely slow. Still chugging away after 8+ minutes.
On the other hand, AOOS's network:all does load.
Haven't been able to get the response size for CeNCOOS. It's quite large, whatever it is. Edit: ~9.5 MB, and it takes a long time to load as well. Fundamentally, there are two timeouts: a timeout for OWSLib to grab the data and a timeout for the queue to kill processing of a particular job after a certain amount of time has elapsed.
Ok, #361 should help the situation for AOOS. Unfortunately, if we're adding a new service which contains a lot of datasets which don't have a previous harvest, this fix won't apply. I think we ought to queue datasets rather than services and set the timeout there, but that would require a bit of retooling. Also I'm still not sure what to do with the CeNCOOS "network:all" DescribeSensor
response. I've tried introducing some code to handle network datasets by introducing a large timeout, but the response is so large and intensive to process that I actually got a server timeout several times when trying to select from it.
So we've improved the rate at which we can successfully harvest AOOS, was 0 and now it's about half.
I've made a breakthrough by removing the 'all' offering from the harvesting process. CeNCOOS is Massive, it's been harvesting on my dev machine for close to 30 minutes now.
Still going....
--------------------------------------------------------------------------------
INFO in harvest [/Users/lcampbell/Documents/Dev/code/catalog/ioos_catalog/tasks/harvest.py:349]:
process_station: urn:ioos:station:gov.usda.nrcs.wcc.snotel:319
--------------------------------------------------------------------------------
Timeout 600
--------------------------------------------------------------------------------
INFO in harvest [/Users/lcampbell/Documents/Dev/code/catalog/ioos_catalog/tasks/harvest.py:349]:
process_station: urn:ioos:station:gov.usda.nrcs.wcc.snotel:320
--------------------------------------------------------------------------------
Timeout 600
--------------------------------------------------------------------------------
INFO in harvest [/Users/lcampbell/Documents/Dev/code/catalog/ioos_catalog/tasks/harvest.py:349]:
process_station: urn:ioos:station:gov.usda.nrcs.wcc.snotel:321
--------------------------------------------------------------------------------
Mark time it's done.
Successfully harvested from cencoos.
Hooray!!!
For summary the solution was basically to increase the timeout to about an hour or two for known large data services, and to skip the "all" offering because that single request would take an equivalent time as the rest of the dataset.
I still an issue with AOOS where one of two things are happening:
:+1:
Validated fix. AOOS has harvested 19/20 times, CeNCOOS 16/20.
Harvesting failures were all either: a) URLError: <urlopen error [Errno -5] No address associated with hostname> or b) Service Ping Timeout: HTTPConnectionPool(host='sos.cencoos.org', port=80): Read timed out. (read timeout=60)
As mentioned in #318, AOOS and CeNCOOS 52North instances appear to be timing out on harvest attempts.