Open geraldstanje opened 2 months ago
You need to setup some runtime parameters like triton_max_batch_size
, max_beam_width
, ... (The parameters like ${xxx}
).
Here is document https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/gemma.md#end-to-end-workflow-to-run-sp-model.
Hi,
I'm trying to use tensorrt-llm with Triton server, but it cannot find my model. any idea why? it looks like my file is invalid:
/tensorrtllm_backend/triton_model_repo/tensorrt_llm_bls/config.pbtxt
here is the config.pbtxt file:
in tensorrt-llm:
than in tensorrtllm_backend i run: