MaxRetryError
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/opt/conda/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/opt/conda/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 358, in connect
conn = self._new_conn()
File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f3a899a85b0>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='data.lpdaac.earthdatacloud.nasa.gov', port=443): Max retries exceeded with url: /s3credentials (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3a899a85b0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
What did you expect?
SoftTimeLimitExceeded and MaxRetryErrors are common errors from failed jobs which we can retry those failed jobs automatically.
of retries for SoftTimeLimitExceeded: 2
of retries for MaxRetryError: 5
Here is the document for generic trigger rule handling common failed jobs.
Checked for duplicates
Yes - I've already checked
Describe the bug
When we did load test 0.5x on INT cluster, we noticed that there were SoftTimeLimitExceeded & MaxRetryError with failed jobs.
SoftTimeLimitExceeded tags: trigger-SCIFLO_L3_DSWx_HLS_S30 status: job-failed resource: job index: job_status-current ID: ec65aeee-ec5c-4287-809d-b2ad4f34d245 payload_id: ec65aeee-ec5c-4287-809d-b2ad4f34d245 timestamp: 2022-07-12T23:01:56.136Z job: SCIFLO_L3_DSWx_HLS__1.0.0-rc.1.0-HLS.S30.T37SFU.2022022T081241.v2.0_state_config-20220712T215330.59494Z node: 100.104.40.97 queue: opera-job_worker-sciflo-l3_dswx_hls time queued: 2022-07-12T21:53:30.059511Z | start: 2022-07-12T22:01:07.629814Z | end: 2022-07-12T23:01:12.463573Z duration: 3604.833759s User Tags... TracebackView triaged products Traceback (most recent call last): File "/home/ops/verdi/ops/hysds-1.1.5/hysds/job_worker.py", line 1193, in run_job monitoredRunner.join() File "/home/ops/verdi/lib/python3.9/site-packages/billiard/process.py", line 148, in join res = self._popen.wait(timeout) File "/home/ops/verdi/lib/python3.9/site-packages/billiard/popen_fork.py", line 57, in wait return self.poll(os.WNOHANG if timeout == 0.0 else 0) File "/home/ops/verdi/lib/python3.9/site-packages/billiard/popen_fork.py", line 33, in poll pid, sts = os.waitpid(self.pid, flag) File "/home/ops/verdi/lib/python3.9/site-packages/billiard/pool.py", line 229, in soft_timeout_sighandler raise SoftTimeLimitExceeded() billiard.exceptions.SoftTimeLimitExceeded: SoftTimeLimitExceeded()
http://opera-dev-triage-fwd-pyoon.s3-website-us-west-2.amazonaws.com/triaged_job-SCIFLO_L3_DSWx_HLS__1.0.0-rc.1.0-HLS.S30.T37SFU.2022022T081241.v2.0_state_config-20220712T215330.59494Z_task-6b51fc23-98d5-4bab-aecd-bd50e2a8a558
MaxRetryError Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/opt/conda/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection raise err File "/opt/conda/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn conn.connect() File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 358, in connect conn = self._new_conn() File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f3a899a85b0>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 440, in send resp = conn.urlopen( File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen retries = retries.increment( File "/opt/conda/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='data.lpdaac.earthdatacloud.nasa.gov', port=443): Max retries exceeded with url: /s3credentials (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3a899a85b0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
What did you expect?
SoftTimeLimitExceeded and MaxRetryErrors are common errors from failed jobs which we can retry those failed jobs automatically.
of retries for SoftTimeLimitExceeded: 2
of retries for MaxRetryError: 5
Here is the document for generic trigger rule handling common failed jobs.
https://hysds-core.atlassian.net/wiki/spaces/HYS/pages/199885482/Generic+Trigger+Rules+for+Mozart+failed+jobs
Reproducible steps
No response
Environment