Closed paguilomanas closed 1 month ago
I found a more updated version of graph.pbtxt #2688 which is working:
input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
node: {
name: "LLMExecutor"
calculator: "HttpLLMCalculator"
input_stream: "LOOPBACK:loopback"
input_stream: "HTTP_REQUEST_PAYLOAD:input"
input_side_packet: "LLM_NODE_RESOURCES:llm"
output_stream: "LOOPBACK:loopback"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
input_stream_info: {
tag_index: "LOOPBACK:0",
back_edge: true
}
node_options: {
[type.googleapis.com / mediapipe.LLMCalculatorOptions]: {
models_path: "./",
plugin_config: '{"KV_CACHE_PRECISION": "u8", "DYNAMIC_QUANTIZATION_GROUP_SIZE": "32"}',
enable_prefix_caching: false
cache_size: 10
}
}
input_stream_handler {
input_stream_handler: "SyncSetInputStreamHandler",
options {
[mediapipe.SyncSetInputStreamHandlerOptions.ext] {
sync_set {
tag_index: "LOOPBACK:0"
}
}
}
}
}
Describe the bug:
Hi everyone, I wanted to serve an optimized LLM model using OVMS. I have tried to follow the continuous batching demo but when I run the container and check the model status it is loading and with errors indicating the deployment failed, meaning the model endpoint is not ready and I cannot make client requests.
To Reproduce I have followed the demo steps:
docker pull openvino/model_server:latest
It might be an important detail to point out that as I had the HF model already downloaded in a specific path I have set the env variable
export HF_HOME="/mnt/shared_models/huggingface/cache"
.cp graph.pbtxt Meta-Llama-3-8B-Instruct/graph.pbtxt
I have all the expected files inside de model folder:
config.json
file as provided:docker run -d --rm -p 8000:8000 -v $(pwd)/:/workspace:ro openvino/model_server:latest --port 9000 --rest_port 8000 --config_path /workspace/config.json
And when running the command
curl http://localhost:8000/v1/config
to check the served model status I get this output:When I was expecting this other output:
Logs To debug I have tried to run the container but deploying the REST port from the inside: First I run the container without the
--rest_port
flag:When I get inside the container with
docker exec -it <container_id> bash
and try to deploy the model myself from the insideovms/bin/ovms --rest_port 8000 --config_path /workspace/config.json --log_level DEBUG
I get this output trace being the most relevant error description: Error parsing text-format mediapipe.CalculatorGraphConfig: 20:26: Expected string, got: { [2024-09-20 16:48:23.593][460][modelmanager][error][mediapipegraphdefinition.cpp:95] Trying to parse mediapipe graph definition: meta-llama/Meta-Llama-3-8B-Instruct failed I show just the relevant part of the trace not to be too long:
Configuration
meta-llama/Meta-Llama-3-8B-Instruct
Additional context I have tried to change the model repository structure trying to add the model files inside
workspace/Meta-Llama-3-8B-Instruct/1/
but it hasn't work either.