Closed anjackson closed 1 year ago
The document harvester seems to be hanging when making connections to the web, causing the jobs to lock up in a way that the Airflow system is not finding easy to deal with.
Suggest adding timeouts to the metadata fetching code.
Seems I missed a timeout on HEAD requests:
HEAD
https://github.com/ukwa/ukwa-manage/blob/00f84867cb1f55d7ff398894f8a838cfa95ce45b/lib/docharvester/document_mdex.py#L206
Tagging this as 2.3.2.
Rolled that out. Looking good but will review later before calling this done.
Okay, looks good! Will reopen if necessary.
The document harvester seems to be hanging when making connections to the web, causing the jobs to lock up in a way that the Airflow system is not finding easy to deal with.
Suggest adding timeouts to the metadata fetching code.