Update GAP tutorial of vllm backend

triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.

BSD 3-Clause "New" or "Revised" License

528 stars 225 forks source link

Closed AndyDai-nv closed 1 week ago

AndyDai-nv commented 2 weeks ago

Specifying --gpus=1 when running docker for vllm backend will not cause out of memory issue during tutorial CI tests.