Closed cisaacstern closed 1 month ago
Hi @cisaacstern, good catch! Yes, as you stated, the dockerfile template is missing the --timeout $TIMEOUT
in the gunicorn command.
In the Lithops default template it is included, so feel free to open a PR and include it in the runtimes/
template.
First of all, just wanted to say how much I appreciate this project! It is truly incredible and a joy to use. π π
Lately I've been experimenting with the GCP Cloud Run backend and encountered a situation where, despite using the default GCP Cloud Run
runtime_timeout
of 300s, I was seeing function calls being killed bygunicorn
at the 30 second mark. From the invoker/client standpoint, this manifests asHTTP 500 Internal Server Error
, and on Cloud Run logs, it looks like:Full traceback
``` Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 135, in handle self.handle_request(listener, req, client, addr) File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 178, in handle_request respiter = self.wsgi(environ, resp.start_response) File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1498, in __call__ return self.wsgi_app(environ, start_response) File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/lithops/lithopsproxy.py", line 58, in run function_handler(message) File "/lithops/lithops/worker/handler.py", line 83, in function_handler python_queue_consumer(0, work_queue, ) File "/lithops/lithops/worker/handler.py", line 135, in python_queue_consumer prepare_and_run_task(task) File "/lithops/lithops/worker/handler.py", line 163, in prepare_and_run_task run_task(task) File "/lithops/lithops/worker/handler.py", line 214, in run_task jrp.join(task.execution_timeout) File "/usr/local/lib/python3.10/multiprocessing/process.py", line 149, in join res = self._popen.wait(timeout) File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 40, in wait if not wait([self.sentinel], timeout): File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 931, in wait ready = selector.select(timeout) File "/usr/local/lib/python3.10/selectors.py", line 416, in select fd_event_list = self._selector.poll(timeout) File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 203, in handle_abort sys.exit(1) SystemExit: 1 ```After some head-scratching, I eventually realized that this was because the
gunicorn
was using its default--timeout 30
and therefore killing workers after 30 seconds. In the custom container I am using, setting--timeout 300
resolved this issue for me.In terms of a possible solution, I did notice that in the knative backend default image,
--timeout $TIMEOUT
appears to be propagated through togunicorn
, but for GCP Cloud Run, while that variable appears to be set, it is not passed through togunicorn --timeout
: https://github.com/lithops-cloud/lithops/blob/41f24cfed6beb996547f1b1546913e7e6116dcde/runtime/gcp_cloudrun/Dockerfile#L50Would it be correct to guess that passing
--timeout $TIMEOUT
here would resolve this issue for the default GCP Cloud Run container (on which my custom container is based)?If so, or if another solution is preferable, I am happy to contribute a PR. Thanks again for all of your work on this! Hopefully I can show my appreciation by making some useful contributions.
xref https://github.com/lithops-cloud/lithops/issues/1362#issuecomment-2137112180 as (thematically, if not directly) related