The Triton Inference Server provides an optimized cloud and edge inferencing solution.
BSD 3-Clause "New" or "Revised" License
8.38k
stars
1.49k
forks
source link
docs: Add support matrix for model parallelism in OpenAI Frontend #7715
Closed
rmccorm4 closed 1 month ago
Add support matrix (and known limitations) around multi-gpu models for vLLM/TRTLLM