docs: Add support matrix for model parallelism in OpenAI Frontend

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

BSD 3-Clause "New" or "Revised" License

8.38k stars 1.49k forks source link

Closed rmccorm4 closed 1 month ago

rmccorm4 commented 1 month ago

Add support matrix (and known limitations) around multi-gpu models for vLLM/TRTLLM