npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend
MIT License
155 stars 25 forks source link

Can't able to connect to triton #39

Closed tapansstardog closed 5 months ago

tapansstardog commented 5 months ago

Hi @npuichigo , I am facing some issues in connecting to my Triton server running locally. Here are the steps that I followed. Please correct me if any of the steps is incorrect.

  1. Start triton server in docker environment.

sudo docker run --gpus=1 --rm -d -p 8000:8000 -p 8001:8001 -p 8002:8002 -v <path_to_configpb_files>:/models -v <path_to_engine files>:/engines tritonserver --model-repository=/models

  1. Go to openai_trtllm code checkout directory run the command below: sudo docker compose up --build

I am getting this error:

Error response from daemon: driver failed programming external connectivity on endpoint openai_trtllm-openai_trtllm-1 (97471c25c1d3fe60e8b43c97ecbc4b7ed9ed7775aec79327ba3126a269ec6a34): Bind for failed: port is already allocated

In case I do not run the triton server as mentioned in Step 1 and directly try to run openai_trtllm, I get this error:

{"timestamp":"2024-04-16T06:08:41.955297Z","level":"INFO","message":"Connecting to triton endpoint: http://tensorrtllm_backend:8001","target":"openai_trtllm::startup"}

Can you suggest the exact steps that I can execute?

Regards Tapan

tapansstardog commented 5 months ago

I think I got it. I need to modify docker-compose.yml as per my needs. Thanks!

tapansstardog commented 5 months ago

Sorry for reopening it @npuichigo . Is it necessary to use your docker image for triton server as mentioned in docker-compose.yml? In case I use that image I am not able to run it and get below error in the logs: exec /opt/nvidia/ exec format error

Due to which openai_trtllm container is not able to connect and get:

{"timestamp":"2024-04-16T07:08:33.823905Z","level":"INFO","message":"Connecting to triton endpoint: http://tensorrtllm_backend:8001","target":"openai_trtllm::startup"}
Error: failed to connect triton endpoint

Also, I got another warning which launching the container: ! tensorrtllm_backend The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Can I run my own already running triton server and then connect to it?

npuichigo commented 5 months ago

It's fine and recomended to use official NVIDIA NGC docker like

The warning seems related to your platform (maybe you're using mac? I'm not sure how do u use nvidia gpu).

tapansstardog commented 5 months ago

I am using the following docker-compose.yml to connect to my triton server.

version: "3"

    image: openai_trtllm
      context: .
      dockerfile: Dockerfile
      - "--host"
      - ""
      - "--port"
      - "3000"
      - "--triton-endpoint"
      - "grpc://"
      - "3000:3000"
      - "8001:8001"
    restart: on-failure

Getting the error: Error response from daemon: driver failed programming external connectivity on endpoint openai_trtllm-openai_trtllm-1 (3f7959a66bf5632f19674354775af8d13ddcdc8908f2aba79b5080241137aba8): Bind for failed: port is already allocated

UPDATE: In case I do not do port mapping of 8001 in docker-compose.yml, the following error is returned:

openai_trtllm-1  | {"timestamp":"2024-04-16T14:45:27.776369Z","level":"INFO","message":"Connecting to triton endpoint: grpc://","target":"openai_trtllm::startup"}
openai_trtllm-1  | Error: failed to connect triton endpoint
openai_trtllm-1  | 
openai_trtllm-1  | Caused by:
openai_trtllm-1  |     0: transport error
openai_trtllm-1  |     1: error trying to connect: tcp connect error: Connection refused (os error 111)
openai_trtllm-1  |     2: tcp connect error: Connection refused (os error 111)
openai_trtllm-1  |     3: Connection refused (os error 111)
openai_trtllm-1 exited with code 0
npuichigo commented 5 months ago

Process in docker cannot connect to process outside do. Please refer to

tapansstardog commented 5 months ago

Thank you very much @npuichigo . This is from container logs:

Attaching to openai_trtllm-1
openai_trtllm-1  | {"timestamp":"2024-04-16T15:36:55.132920Z","level":"INFO","message":"Connecting to triton endpoint: grpc://","target":"openai_trtllm::startup"}
openai_trtllm-1  | {"timestamp":"2024-04-16T15:36:55.133327Z","level":"INFO","message":"Starting server at","target":"openai_trtllm::startup"}

Just want to confirm now that I believe openai_trtllm is properly setup and connected to triton server using grpc protocol. Is that correct.

npuichigo commented 5 months ago

I think it’s correct

tapansstardog commented 5 months ago

Thanks @npuichigo . I have made the progress.

I am connecting to openai_trtllm using Postman as client and I think model_stream_infer failed here.

let mut stream = client
        .context("failed to call triton grpc method model_stream_infer")?

My POST request from Postman:


    "model": "ensemble",
    "prompt": ["In python, write a function for binary searching an element in an integer array."],
    "max_tokens": 200,
    "stop": ["--"],
    "temperature": 0.0

INFO and ERROR logs at openai_trtllm end:

openai_trtllm-1  | {"timestamp":"2024-04-17T10:04:12.973172Z","level":"INFO","message":"request: Json(CompletionCreateParams { model: \"ensemble\", prompt: [\"In python, write a function for binary searching an element in an integer array.\"], best_of: 1, echo: false, frequency_penalty: 0.0, logit_bias: None, logprobs: None, max_tokens: 200, n: 1, presence_penalty: 0.0, seed: None, stop: Some([\"B. London\"]), stream: false, suffix: None, temperature: 0.0, top_p: 1.0, user: None })","target":"openai_trtllm::routes::completions","span":{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"completions"},"spans":[{"http.request.method":"POST","http.route":"/v1/completions","network.protocol.version":"1.1","otel.kind":"Server","":"POST /v1/completions","server.address":"localhost:3000","span.type":"web","url.path":"/v1/completions","url.scheme":"","user_agent.original":"PostmanRuntime/7.37.3","name":"HTTP request"},{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"completions"}]}
openai_trtllm-1  | {"timestamp":"2024-04-17T10:04:12.973745Z","level":"ERROR","error":"AppError(failed to call triton grpc method model_stream_infer\n\nCaused by:\n    status: Unknown, message: \"Bad :scheme header\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\"} })","target":"openai_trtllm::routes::completions","span":{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"non-streaming completions"},"spans":[{"http.request.method":"POST","http.route":"/v1/completions","network.protocol.version":"1.1","otel.kind":"Server","":"POST /v1/completions","server.address":"localhost:3000","span.type":"web","url.path":"/v1/completions","url.scheme":"","user_agent.original":"PostmanRuntime/7.37.3","name":"HTTP request"},{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"completions"},{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"non-streaming completions"}]}

At Triton server end:

I0416 07:36:30.688923 1] Started GRPCInferenceService at
I0416 07:36:30.689069 1] Started HTTPService at
I0416 07:36:30.729902 1] Started Metrics Service at
E0417 07:52:12.717523050     557]       Error parsing metadata: error=invalid value key=:scheme value=grpc
E0417 08:41:05.601946513     557]       Error parsing metadata: error=invalid value key=:scheme value=grpc

Please note that I am providing triton-endpoint in docker-compose.ymlas "grpc://". Is that the right way?

tapansstardog commented 5 months ago

@npuichigo, Thanks for being patient :-). I am able to run end to end. The triton end point should have had http protocol instead of grpc. Worked fine! I am closing this issue. Thanks again!