triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
528 stars 225 forks source link

Update GAP tutorial of vllm backend #743

Closed AndyDai-nv closed 1 week ago

AndyDai-nv commented 2 weeks ago

Specifying --gpus=1 when running docker for vllm backend will not cause out of memory issue during tutorial CI tests.

See https://gitlab-master.nvidia.com/dl/triton/perf_analyzer_ci/-/pipelines/16443689