Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
528
stars
225
forks
source link
Update GAP tutorial of vllm backend #743
Closed
AndyDai-nv closed 1 week ago
Specifying
--gpus=1
when running docker for vllm backend will not cause out of memory issue during tutorial CI tests.See https://gitlab-master.nvidia.com/dl/triton/perf_analyzer_ci/-/pipelines/16443689