Closed sergii-mamedov closed 2 years ago
@kpavel can you handle it? :)
As far as I understand, the code responsible for the retry goes here and I need to explicitly insert the connection_retries
value into the code engine config.
Hello @sergii-mamedov, is it a full stacktrace? No other traces from code_engine level?
regarding connection_retries - yes, you need to specify in the config file retries number, e.g. 10.
the issue here is that we are currently handling connection retries when 409 exception happens with the reason "AlreadyExists", while the one you get has a reason "Conflict". We can add another reason in the filtering to let it retry.
@kpavel
Traceback (most recent call last):
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 325, in run
futures = executor.map(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/executors.py", line 277, in map
futures = self.invoker.run_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 269, in run_job
futures = self._run_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 208, in _run_job
raise e
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 205, in _run_job
self._invoke_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 252, in _invoke_job
activation_id = self.compute_handler.invoke(payload)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/serverless.py", line 59, in invoke
return self.backend.invoke(runtime_name, runtime_memory, job_payload)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 406, in invoke
self._run_job(jobrun_res)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 48, in decorated_func
return func(*args, **kwargs)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 414, in _run_job
self.custom_api.create_namespaced_custom_object(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 225, in create_namespaced_custom_object
return self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs) # noqa: E501
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 344, in create_namespaced_custom_object_with_http_info
return self.api_client.call_api(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
return self.rest_client.POST(url,
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/rest.py", line 274, in POST
return self.request("POST", url,
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/rest.py", line 233, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'audit-id': 'a082f7b4-70d3-4708-a29e-8ec22f965220', 'cache-control': 'no-cache, private', 'content-length': '326', 'content-type': 'application/json', 'date': 'Sat, 03 Sep 2022 03:19:13 GMT', 'x-kubernetes-pf-flowschema-uid': '05dc5f92-6c95-416b-b707-da00a57e0adc', 'x-kubernetes-pf-prioritylevel-uid': '8d40cc3d-a5c4-47ee-8978-ec5f7a197920'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on resourcequotas \"5s4f5qcqf4f\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"5s4f5qcqf4f","kind":"resourcequotas"},"code":409}
@sergii-mamedov Does #995 solve your issue?
@JosepSampe Sorry for a long answer. I deployed this changes on production. I will get back to you in a couple of days with an answer.
@kpavel @JosepSampe Unfortunately, the patch didn't change the situation (which is a bit strange for me). I added some logging, stacktrace below:
2022-10-13 18:36:00,920 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:172 - ExecutorID 77ece4-14 | JobID M005 - Starting function invocation: process_centr_segment() - Total: 277 activations
2022-10-13 18:36:01,360 - INFO - lithops.serverless.backends.code_engine.code_engine[Thread-1-ex] - code_engine.py:54 - connection_retries=10
2022-10-13 18:36:01,361 - INFO - lithops.serverless.backends.code_engine.code_engine[Thread-1-ex] - code_engine.py:58 - HTTP409
2022-10-13 18:36:11,376 - ERROR - engine.lithops-wrapper[Thread-1] - executor.py:286 - process_centr_segment raised an exception. Failed activation(s): [] ID(s): []
Traceback (most recent call last):
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 325, in run
futures = executor.map(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/executors.py", line 288, in map
futures = self.invoker.run_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 266, in run_job
futures = self._run_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 205, in _run_job
raise e
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 202, in _run_job
self._invoke_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 249, in _invoke_job
activation_id = self.compute_handler.invoke(payload)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/serverless.py", line 59, in invoke
return self.backend.invoke(runtime_name, runtime_memory, job_payload)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 396, in invoke
self._run_job(jobrun_res)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 67, in decorated_func
raise e
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 52, in decorated_func
return func(*args, **kwargs)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/serverless/backends/code_engine/code_engine.py", line 404, in _run_job
self.custom_api.create_namespaced_custom_object(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 225, in create_namespaced_custom_object
return self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs) # noqa: E501
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 344, in create_namespaced_custom_object_with_http_info
return self.api_client.call_api(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
return self.rest_client.POST(url,
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/rest.py", line 274, in POST
return self.request("POST", url,
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/kubernetes/client/rest.py", line 233, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'audit-id': 'd4735826-f7ab-4244-91ec-43b59d77b169', 'cache-control': 'no-cache, private', 'content-length': '326', 'content-type': 'application/json', 'date': 'Thu, 13 Oct 2022 16:36:01 GMT', 'x-kubernetes-pf-flowschema-uid': '05dc5f92-6c95-4$
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on resourcequotas \"5s4f5qcqf4f\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","det$
@sergii-mamedov Can you please either change lithops logging level to DEBUG or modify this line in your code to write warning level logs.
E.g.
logger.warning("Encountered conflict error {}, ignoring".format(body.get('message')))
Could be we are actually running out of retries, but don't notice because this log level is debug. I'll update this log later in the code as well to warning.
@kpavel Did it, I will let you know when this problem occurs again.
@kpavel I understood what the problem was. There is no condition in the code to handle the block for HTTP 409. You should either add `continue' to this block
if e.status == 409:
some_logic
continue
if e.status == 500:
some_logic
else:
raise e
or perhaps it is better to make a solid if - elif - else
block:
if e.status == 409:
some_logic
elif e.status == 500:
some_logic
else:
raise e
I tested first option - works well.
@sergii-mamedov Fixed
@JosepSampe Thanks, I will be grateful for the release of a new version when possible.
@sergii-mamedov @JosepSampe I will draft new relase
@sergii-mamedov We released Lithops v2.7.1
thanks
Recently, I have been seeing frequent HTTP 409 errors when processing data in bulk. I spoke with the IBM customer support to clarify where the source of the problem is and how it can be fixed. So far, all I've gotten from them is this response:
The original stacktrace I get:
Is it possible to add handling of such exceptions to lithops? I use lithops 2.7.0.