microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Ingest sometimes fails to fetch PI image #1397

Closed eecavanna closed 2 months ago

eecavanna commented 2 months ago

Background

One of the ingest attempts in the production environment on Monday, Sep. 23, 2024 (during the 2024.9 Release), failed with the following stack trace:

2024-09-23T22:02:41.417394528Z   File "/app/nmdc_server/ingest/all.py", line 74, in load
2024-09-23T22:02:41.417432289Z     study.load(db, mongodb["study_set"].find())
2024-09-23T22:02:41.417442439Z   File "/app/nmdc_server/ingest/study.py", line 70, in load
2024-09-23T22:02:41.417489381Z     obj["principal_investigator_id"] = get_or_create_pi(db, pi_name, pi_url, pi_orcid)
2024-09-23T22:02:41.417501311Z   File "/app/nmdc_server/ingest/study.py", line 26, in get_or_create_pi
2024-09-23T22:02:41.417511631Z     r = requests.get(url)
2024-09-23T22:02:41.417522212Z   File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 73, in get
2024-09-23T22:02:41.417584873Z     return request("get", url, params=params, **kwargs)
2024-09-23T22:02:41.417596034Z   File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59, in request
2024-09-23T22:02:41.417625014Z     return session.request(method=method, url=url, **kwargs)
2024-09-23T22:02:41.417634824Z   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
2024-09-23T22:02:41.417763928Z     resp = self.send(prep, **send_kwargs)
2024-09-23T22:02:41.417774698Z   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
2024-09-23T22:02:41.417911502Z     r = adapter.send(request, **kwargs)
2024-09-23T22:02:41.417923382Z   File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 622, in send
2024-09-23T22:02:41.418052915Z     raise ConnectionError(e, request=request)
2024-09-23T22:02:41.418063416Z requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.ornl.gov', port=443): Max retries exceeded with url: /sites/default/files/styles/staff_profile_image_style/public/2023-12/IMG_4627.JPG?h=d6f62329&itok=pChv5-U7 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f2991f0b130>: Failed to resolve 'www.ornl.gov' ([Errno -3] Temporary failure in name resolution)"))

Based on the final line of that error message, I think the system running the ingest code (i.e. a container running on NERSC's Spin platform) failed to resolve www.ornl.gov to an IP address.

A few minutes after that error occurred, I tried visiting http:// + www.ornl.gov+ /sites/default/files/styles/staff_profile_image_style/public/2023-12/IMG_4627.JPG?h=d6f62329&itok=pChv5-U7 in my local web browser and the image loaded OK for me (although I noticed the URL I ended up at was different from the one I entered). In addition, a few minutes after that error occurred, my teammate confirmed that, when he curl-ed the URL from NERSC's dtn01.nersc.gov server, he did, indeed, get the image of the PI. Based upon those two observations, he and I concluded this was an intermittent symptom.

Indeed, when we re-ran the ingest a few minutes later, it did not fail to fetch this PI image.

Task

Progress

Looks like there is a fix or workaround being developed here: https://github.com/microbiomedata/nmdc-server/pull/1262

marySalvi commented 2 months ago

closed by #1262