I am running the command tritonserver --model-repository=path_to_triton_models/gemma2 --model-repository=path_to_triton_models/llama3 --model-namespacing=true. All the models are loaded correctly as confirmed by the logs.
At this point I want to send a query to a model. In a single-model deployment scenario, I would use the following curl command:
curl -X POST \
-s localhost:8000/v2/models/tensorrt_llm_bls/generate \
-d '{
"text_input": "What is machine learning?",
"max_tokens": 512,
}'
However, if I use the same endpoint (localhost:8000/v2/models/tensorrt_llm_bls/generate) in the two-models deployment scenario I get, as expected, the following error:
{"error":"There are 2 identifiers of model 'tensorrt_llm_bls' in global map, model namespace must be provided to resolve ambiguity."}
The problem is that I don't know how should I change the target endpoint with --model-namespacing enabled. I tried many things but none of them worked and it seems there is no documentation about this.
Can you help me out? Thanks in advance. Tagging @rmccorm4 for support.
Hi,
I am trying to serve two LLMs concurrently with TensorRT-LLM backend. The folder structure of the two Triton Model Repositories is the following:
I am running the command
tritonserver --model-repository=path_to_triton_models/gemma2 --model-repository=path_to_triton_models/llama3 --model-namespacing=true
. All the models are loaded correctly as confirmed by the logs.At this point I want to send a query to a model. In a single-model deployment scenario, I would use the following curl command:
However, if I use the same endpoint (
localhost:8000/v2/models/tensorrt_llm_bls/generate
) in the two-models deployment scenario I get, as expected, the following error:The problem is that I don't know how should I change the target endpoint with
--model-namespacing
enabled. I tried many things but none of them worked and it seems there is no documentation about this.Can you help me out? Thanks in advance. Tagging @rmccorm4 for support.