One of the ingest attempts in the production environment on Monday, Sep. 23, 2024 (during the 2024.9 Release), failed with the following stack trace:
2024-09-23T22:02:41.417394528Z File "/app/nmdc_server/ingest/all.py", line 74, in load
2024-09-23T22:02:41.417432289Z study.load(db, mongodb["study_set"].find())
2024-09-23T22:02:41.417442439Z File "/app/nmdc_server/ingest/study.py", line 70, in load
2024-09-23T22:02:41.417489381Z obj["principal_investigator_id"] = get_or_create_pi(db, pi_name, pi_url, pi_orcid)
2024-09-23T22:02:41.417501311Z File "/app/nmdc_server/ingest/study.py", line 26, in get_or_create_pi
2024-09-23T22:02:41.417511631Z r = requests.get(url)
2024-09-23T22:02:41.417522212Z File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 73, in get
2024-09-23T22:02:41.417584873Z return request("get", url, params=params, **kwargs)
2024-09-23T22:02:41.417596034Z File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59, in request
2024-09-23T22:02:41.417625014Z return session.request(method=method, url=url, **kwargs)
2024-09-23T22:02:41.417634824Z File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
2024-09-23T22:02:41.417763928Z resp = self.send(prep, **send_kwargs)
2024-09-23T22:02:41.417774698Z File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
2024-09-23T22:02:41.417911502Z r = adapter.send(request, **kwargs)
2024-09-23T22:02:41.417923382Z File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 622, in send
2024-09-23T22:02:41.418052915Z raise ConnectionError(e, request=request)
2024-09-23T22:02:41.418063416Z requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.ornl.gov', port=443): Max retries exceeded with url: /sites/default/files/styles/staff_profile_image_style/public/2023-12/IMG_4627.JPG?h=d6f62329&itok=pChv5-U7 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f2991f0b130>: Failed to resolve 'www.ornl.gov' ([Errno -3] Temporary failure in name resolution)"))
Based on the final line of that error message, I think the system running the ingest code (i.e. a container running on NERSC's Spin platform) failed to resolve www.ornl.gov to an IP address.
A few minutes after that error occurred, I tried visiting http:// + www.ornl.gov+ /sites/default/files/styles/staff_profile_image_style/public/2023-12/IMG_4627.JPG?h=d6f62329&itok=pChv5-U7 in my local web browser and the image loaded OK for me (although I noticed the URL I ended up at was different from the one I entered). In addition, a few minutes after that error occurred, my teammate confirmed that, when he curl-ed the URL from NERSC's dtn01.nersc.gov server, he did, indeed, get the image of the PI. Based upon those two observations, he and I concluded this was an intermittent symptom.
Indeed, when we re-ran the ingest a few minutes later, it did not fail to fetch this PI image.
Task
Make it so ingest, as a whole, does not fail when it fails to fetch a PI image
Also, consider caching PI images locally (e.g. on GitHub, on the NERSC filesystem, as a DataURL in some database, etc.) as a fallback for when the subsequent fetch of the same URL fails
Background
One of the ingest attempts in the production environment on Monday, Sep. 23, 2024 (during the 2024.9 Release), failed with the following stack trace:
Based on the final line of that error message, I think the system running the ingest code (i.e. a container running on NERSC's Spin platform) failed to resolve
www.ornl.gov
to an IP address.A few minutes after that error occurred, I tried visiting
http://
+www.ornl.gov
+/sites/default/files/styles/staff_profile_image_style/public/2023-12/IMG_4627.JPG?h=d6f62329&itok=pChv5-U7
in my local web browser and the image loaded OK for me (although I noticed the URL I ended up at was different from the one I entered). In addition, a few minutes after that error occurred, my teammate confirmed that, when hecurl
-ed the URL from NERSC'sdtn01.nersc.gov
server, he did, indeed, get the image of the PI. Based upon those two observations, he and I concluded this was an intermittent symptom.Indeed, when we re-ran the ingest a few minutes later, it did not fail to fetch this PI image.
Task
Progress
Looks like there is a fix or workaround being developed here: https://github.com/microbiomedata/nmdc-server/pull/1262