triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.38k stars 1.49k forks source link

docs: Add support matrix for model parallelism in OpenAI Frontend #7715

Closed rmccorm4 closed 1 month ago

rmccorm4 commented 1 month ago

Add support matrix (and known limitations) around multi-gpu models for vLLM/TRTLLM