mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
17.98k stars 4.06k forks source link

[BUG]Prompt Engineering request from UI to Deployments Server Connection TimeOut #12068

Open s-natsubori opened 2 months ago

s-natsubori commented 2 months ago

Issues Policy acknowledgement

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

System information

Describe the problem

With Prompt Engineering UI, Select LLM Model GPT-4 or GPT-4-turbo, and set Max tokens over 1024, MLflow deployment returned the following error: "INTERNAL_ERROR". In fact, the above error will occur in all cases where the LLM model cannot respond within 30 seconds.

I checked the logs on the API side, API returned a response normally. (it will take more than 30 seconds) However, the tracking server seems to treat it as aconnection timeout.

Processing of LLM models can be very slow, so the timeout should be configurable. (30 seconds is too short!!!)

Tracking information

System information: Linux #1 SMP PREEMPT_DYNAMIC Mon Apr  8 19:19:48 UTC 2024
Python version: 3.10.14
MLflow version: 2.12.2
MLflow module location: /usr/local/lib/python3.10/site-packages/mlflow/__init__.py
Tracking URI: file:///mlruns
Registry URI: file:///mlruns
MLflow environment variables: 
  MLFLOW_DEPLOYMENTS_TARGET: http://api:5000
  MLFLOW_SCORING_SERVER_REQUEST_TIMEOUT: 120
MLflow dependencies: 
  Flask: 3.0.3
  Jinja2: 3.1.4
  aiohttp: 3.9.5
  alembic: 1.13.1
  boto3: 1.34.108
  botocore: 1.34.108
  click: 8.1.7
  cloudpickle: 3.0.0
  docker: 7.0.0
  entrypoints: 0.4
  fastapi: 0.111.0
  gitpython: 3.1.43
  graphene: 3.3
  gunicorn: 22.0.0
  importlib-metadata: 7.1.0
  markdown: 3.6
  matplotlib: 3.8.4
  numpy: 1.26.4
  packaging: 24.0
  pandas: 2.2.2
  protobuf: 4.25.3
  pyarrow: 15.0.2
  pydantic: 2.7.1
  pytz: 2024.1
  pyyaml: 6.0.1
  querystring-parser: 1.2.4
  requests: 2.31.0
  scikit-learn: 1.4.2
  scipy: 1.13.0
  slowapi: 0.1.9
  sqlalchemy: 2.0.30
  sqlparse: 0.5.0
  tiktoken: 0.7.0
  uvicorn: 0.29.0
  watchfiles: 0.21.0

Code to reproduce issue

docker-compose.yaml

services: 
  tracking-server:
    build: 
      context: .
    env_file: 
      - .env
    environment:
      MLFLOW_DEPLOYMENTS_TARGET: http://api:5000
      MLFLOW_SCORING_SERVER_REQUEST_TIMEOUT: 120
    ports:
      - "5000:5000"
    command: mlflow server --host 0.0.0.0

  api:
    build: 
      context: .
    volumes: 
      - ./conf:/conf
    env_file: 
      - .env

    sysctls:
      net.ipv4.tcp_syn_retries: 10
    command: mlflow deployments start-server --host 0.0.0.0 --config-path /conf/deploy_config.yaml

Dockerfile

FROM ghcr.io/mlflow/mlflow:v2.12.2

RUN pip install mlflow[genai] psutil

COPY ./conf /conf

Stack trace

Tracking server Log

mlflow_prompt-tracking-server-1  | [2024-05-20 18:31:59 +0900] [103] [INFO] Worker exiting (pid: 103)
mlflow_prompt-tracking-server-1  | [2024-05-20 18:31:59 +0900] [119] [INFO] Booting worker with pid: 119
mlflow_prompt-tracking-server-1  | [2024-05-20 18:47:23 +0900] [22] [CRITICAL] WORKER TIMEOUT (pid:87)
mlflow_prompt-tracking-server-1  | [2024-05-20 18:47:23 +0900] [87] [ERROR] Error handling request /mlflow/ajax-api/2.0/mlflow/gateway-proxy
mlflow_prompt-tracking-server-1  | Traceback (most recent call last):
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 135, in handle
mlflow_prompt-tracking-server-1  |     self.handle_request(listener, req, client, addr)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
mlflow_prompt-tracking-server-1  |     respiter = self.wsgi(environ, resp.start_response)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1498, in __call__
mlflow_prompt-tracking-server-1  |     return self.wsgi_app(environ, start_response)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app
mlflow_prompt-tracking-server-1  |     response = self.full_dispatch_request()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
mlflow_prompt-tracking-server-1  |     rv = self.dispatch_request()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
mlflow_prompt-tracking-server-1  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/mlflow/server/__init__.py", line 108, in serve_gateway_proxy
mlflow_prompt-tracking-server-1  |     return gateway_proxy_handler()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 510, in wrapper
mlflow_prompt-tracking-server-1  |     return func(*args, **kwargs)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/mlflow/server/handlers.py", line 1304, in gateway_proxy_handler
mlflow_prompt-tracking-server-1  |     response = requests.request(request_type, f"{target_uri}/{gateway_path}", json=json_data)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/requests/api.py", line 59, in request
mlflow_prompt-tracking-server-1  |     return session.request(method=method, url=url, **kwargs)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
mlflow_prompt-tracking-server-1  |     resp = self.send(prep, **send_kwargs)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
mlflow_prompt-tracking-server-1  |     r = adapter.send(request, **kwargs)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
mlflow_prompt-tracking-server-1  |     resp = conn.urlopen(
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
mlflow_prompt-tracking-server-1  |     response = self._make_request(
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 537, in _make_request
mlflow_prompt-tracking-server-1  |     response = conn.getresponse()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 466, in getresponse
mlflow_prompt-tracking-server-1  |     httplib_response = super().getresponse()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/http/client.py", line 1375, in getresponse
mlflow_prompt-tracking-server-1  |     response.begin()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
mlflow_prompt-tracking-server-1  |     version, status, reason = self._read_status()
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/http/client.py", line 279, in _read_status
mlflow_prompt-tracking-server-1  |     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
mlflow_prompt-tracking-server-1  |     return self._sock.recv_into(b)
mlflow_prompt-tracking-server-1  |   File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
mlflow_prompt-tracking-server-1  |     sys.exit(1)
mlflow_prompt-tracking-server-1  | SystemExit: 1

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

serena-ruan commented 2 months ago

Thanks for reporting this @s-natsubori ! It sounds reasonable to add --timeout args in mlflow server command. We'll discuss internally and get back to you. Would you like to contribute if possible?

s-natsubori commented 2 months ago

Thanks for check. I think, it can be solved by set timeout to requests.

https://github.com/mlflow/mlflow/blob/e17693820cafeee3fccfdb14310c7406d5ae61dd/mlflow/server/handlers.py#L1304

 response = requests.request(request_type, f"{target_uri}/{gateway_path}", json=json_data, timeout=REQUEST_TIMEOUT)
serena-ruan commented 2 months ago

Could you raise a PR to add an environment variable and support passing it here? You already have the context to test it then 😄

s-natsubori commented 1 month ago

Sorry, I misunderstood. request module is not the cause of the error.

I tried setting the timeout parameter to request , but the situation did not change at all. Next, I try to control aiohttp.ClientTimeout but this also has no effect. (I'm new to aiohttp, so my settings may be wrong.) https://github.com/mlflow/mlflow/blob/2a3ee6caf38ecdfe43b068ab6d7fabf07b198625/mlflow/gateway/providers/utils.py#L16

From Stack trace, It also appears that the fastAPI process timeout is the cause. Any Idea?

github-actions[bot] commented 1 month ago

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.