Closed tapansstardog closed 7 months ago
I think I got it. I need to modify docker-compose.yml as per my needs. Thanks!
Sorry for reopening it @npuichigo . Is it necessary to use your docker image for triton server as mentioned in docker-compose.yml
?
In case I use that image I am not able to run it and get below error in the logs:
exec /opt/nvidia/nvidia_entrypoint.sh: exec format error
Due to which openai_trtllm container is not able to connect and get:
{"timestamp":"2024-04-16T07:08:33.823905Z","level":"INFO","message":"Connecting to triton endpoint: http://tensorrtllm_backend:8001","target":"openai_trtllm::startup"}
Error: failed to connect triton endpoint
Also, I got another warning which launching the container:
! tensorrtllm_backend The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Can I run my own already running triton server and then connect to it?
It's fine and recomended to use official NVIDIA NGC docker like nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3.
The warning seems related to your platform (maybe you're using mac? I'm not sure how do u use nvidia gpu).
I am using the following docker-compose.yml
to connect to my triton server.
version: "3"
services:
openai_trtllm:
image: openai_trtllm
build:
context: .
dockerfile: Dockerfile
command:
- "--host"
- "0.0.0.0"
- "--port"
- "3000"
- "--triton-endpoint"
- "grpc://0.0.0.0:8001"
ports:
- "3000:3000"
- "8001:8001"
restart: on-failure
Getting the error:
Error response from daemon: driver failed programming external connectivity on endpoint openai_trtllm-openai_trtllm-1 (3f7959a66bf5632f19674354775af8d13ddcdc8908f2aba79b5080241137aba8): Bind for 0.0.0.0:8001 failed: port is already allocated
UPDATE:
In case I do not do port mapping of 8001 in docker-compose.yml
, the following error is returned:
openai_trtllm-1 | {"timestamp":"2024-04-16T14:45:27.776369Z","level":"INFO","message":"Connecting to triton endpoint: grpc://0.0.0.0:8001","target":"openai_trtllm::startup"}
openai_trtllm-1 | Error: failed to connect triton endpoint
openai_trtllm-1 |
openai_trtllm-1 | Caused by:
openai_trtllm-1 | 0: transport error
openai_trtllm-1 | 1: error trying to connect: tcp connect error: Connection refused (os error 111)
openai_trtllm-1 | 2: tcp connect error: Connection refused (os error 111)
openai_trtllm-1 | 3: Connection refused (os error 111)
openai_trtllm-1 exited with code 0
Process in docker cannot connect to process outside do. Please refer to https://www.squash.io/tutorial-host-network-in-docker-compose/#google_vignette
Thank you very much @npuichigo . This is from container logs:
Attaching to openai_trtllm-1
openai_trtllm-1 | {"timestamp":"2024-04-16T15:36:55.132920Z","level":"INFO","message":"Connecting to triton endpoint: grpc://0.0.0.0:8001","target":"openai_trtllm::startup"}
openai_trtllm-1 | {"timestamp":"2024-04-16T15:36:55.133327Z","level":"INFO","message":"Starting server at 0.0.0.0:3000","target":"openai_trtllm::startup"}
Just want to confirm now that I believe openai_trtllm
is properly setup and connected to triton server using grpc protocol. Is that correct.
I think it’s correct
Thanks @npuichigo . I have made the progress.
I am connecting to openai_trtllm
using Postman as client and I think model_stream_infer
failed here.
let mut stream = client
.model_stream_infer(request)
.await
.context("failed to call triton grpc method model_stream_infer")?
.into_inner();
My POST request from Postman:
http://localhost:3000/v1/completions
{
"model": "ensemble",
"prompt": ["In python, write a function for binary searching an element in an integer array."],
"max_tokens": 200,
"stop": ["--"],
"temperature": 0.0
}
INFO and ERROR logs at openai_trtllm
end:
openai_trtllm-1 | {"timestamp":"2024-04-17T10:04:12.973172Z","level":"INFO","message":"request: Json(CompletionCreateParams { model: \"ensemble\", prompt: [\"In python, write a function for binary searching an element in an integer array.\"], best_of: 1, echo: false, frequency_penalty: 0.0, logit_bias: None, logprobs: None, max_tokens: 200, n: 1, presence_penalty: 0.0, seed: None, stop: Some([\"B. London\"]), stream: false, suffix: None, temperature: 0.0, top_p: 1.0, user: None })","target":"openai_trtllm::routes::completions","span":{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"completions"},"spans":[{"http.request.method":"POST","http.route":"/v1/completions","network.protocol.version":"1.1","otel.kind":"Server","otel.name":"POST /v1/completions","server.address":"localhost:3000","span.type":"web","url.path":"/v1/completions","url.scheme":"","user_agent.original":"PostmanRuntime/7.37.3","name":"HTTP request"},{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"completions"}]}
openai_trtllm-1 | {"timestamp":"2024-04-17T10:04:12.973745Z","level":"ERROR","error":"AppError(failed to call triton grpc method model_stream_infer\n\nCaused by:\n status: Unknown, message: \"Bad :scheme header\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\"} })","target":"openai_trtllm::routes::completions","span":{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"non-streaming completions"},"spans":[{"http.request.method":"POST","http.route":"/v1/completions","network.protocol.version":"1.1","otel.kind":"Server","otel.name":"POST /v1/completions","server.address":"localhost:3000","span.type":"web","url.path":"/v1/completions","url.scheme":"","user_agent.original":"PostmanRuntime/7.37.3","name":"HTTP request"},{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"completions"},{"headers":"{\"content-type\": \"application/json\", \"user-agent\": \"PostmanRuntime/7.37.3\", \"accept\": \"*/*\", \"postman-token\": \"2fb36acd-53dc-4689-a7be-166a21f8d631\", \"host\": \"localhost:3000\", \"accept-encoding\": \"gzip, deflate, br\", \"connection\": \"keep-alive\", \"content-length\": \"201\"}","name":"non-streaming completions"}]}
At Triton server end:
I0416 07:36:30.688923 1 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0416 07:36:30.689069 1 http_server.cc:4623] Started HTTPService at 0.0.0.0:8000
I0416 07:36:30.729902 1 http_server.cc:315] Started Metrics Service at 0.0.0.0:8002
E0417 07:52:12.717523050 557 hpack_parser.cc:1235] Error parsing metadata: error=invalid value key=:scheme value=grpc
E0417 08:41:05.601946513 557 hpack_parser.cc:1235] Error parsing metadata: error=invalid value key=:scheme value=grpc
Please note that I am providing triton-endpoint
in docker-compose.yml
as "grpc://0.0.0.0:8001"
. Is that the right way?
@npuichigo, Thanks for being patient :-). I am able to run end to end. The triton end point should have had http protocol instead of grpc. Worked fine! I am closing this issue. Thanks again!
Hi @npuichigo , I am facing some issues in connecting to my Triton server running locally. Here are the steps that I followed. Please correct me if any of the steps is incorrect.
sudo docker run --gpus=1 --rm -d -p 8000:8000 -p 8001:8001 -p 8002:8002 -v <path_to_configpb_files>:/models -v <path_to_engine files>:/engines nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3 tritonserver --model-repository=/models
openai_trtllm
code checkout directory run the command below:sudo docker compose up --build
I am getting this error:
Error response from daemon: driver failed programming external connectivity on endpoint openai_trtllm-openai_trtllm-1 (97471c25c1d3fe60e8b43c97ecbc4b7ed9ed7775aec79327ba3126a269ec6a34): Bind for 0.0.0.0:3000 failed: port is already allocated
In case I do not run the triton server as mentioned in Step 1 and directly try to run openai_trtllm, I get this error:
{"timestamp":"2024-04-16T06:08:41.955297Z","level":"INFO","message":"Connecting to triton endpoint: http://tensorrtllm_backend:8001","target":"openai_trtllm::startup"}
Can you suggest the exact steps that I can execute?
Regards Tapan