triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8k stars 1.44k forks source link

UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' #6708

Closed lyc728 closed 8 months ago

lyc728 commented 8 months ago

UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' UNAVAILABLE: Invalid argument: unable to find 'libtriton_onnxruntime.so' I pull the image llm_trt (nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3) but the image not contain onnxruntime and pytorch

企业微信截图_1702620565604
oandreeva-nv commented 8 months ago

Hi @lyc, trt_llm container is a container exclusively with TensorRT-LLM. If you would like to use a pythorch and onnxruntime, please use our base container: nvcr.io/nvidia/tritonserver:23.11-py3

oandreeva-nv commented 8 months ago

Alternatively, you can build a container with trtllm and other backends with build.py and all required backends:

--backend=tensorrtllm --backend=python --backend=onnxruntime --backend=pytorch

Please, refer to documentation here: https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/build.md#building-with-docker

I will close this issue for now, feel free to reach out with any questions.

lyc728 commented 8 months ago

--backend=onnxruntime --backend=pytorch

tritonserver --model-repository=/models/liuyuanchao/tensorrtllm_backend/triton_model_repo --backend-config=onnxruntime --backend-config=pytorch

it comes error

企业微信截图_17030410417984
oandreeva-nv commented 8 months ago

Please use --backend flag, not --backend-config

oandreeva-nv commented 8 months ago

Please, also use build steps provided in the docs, I linked. To build the container, you need to call build.py script outside of any containers. For example:

./build.py -v --no-container-interactive --enable-logging --enable-stats --enable-tracing \
              --enable-metrics --enable-gpu-metrics --enable-cpu-metrics \
              --filesystem=gcs --filesystem=s3 --filesystem=azure_storage \
              --endpoint=http --endpoint=grpc --endpoint=sagemaker --endpoint=vertex-ai \
              --backend=ensemble --enable-gpu --endpoint=http --endpoint=grpc \
              --backend=tensorrtllm\
              --backend=python --backend=onnxruntime

Note that this is different from running tritonserver. Steps above will build you a custom container with all specified backends (through backend tag)