triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.39k stars 1.49k forks source link

floating point exception with Triton version 24.07 when loading tensorrt_llm backend models #7556

Closed janpetrov closed 2 months ago

janpetrov commented 3 months ago

Please see https://github.com/triton-inference-server/tensorrtllm_backend/issues/579

The issue seems to be specifically related to tensorrt_llm backend and Triton server version 24.07

Tabrizian commented 2 months ago

Please try 24.08 and let us know if you still run into this issue.