fix: Support sampling parameters of type List for vLLM backend (stop words)

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

BSD 3-Clause "New" or "Revised" License

8.4k stars 1.49k forks source link

fix: Support sampling parameters of type List for vLLM backend (stop words) #7682

Closed rmccorm4 closed 1 month ago

rmccorm4 commented 1 month ago

Adds support and some smoke testing for parameters of type List such as stop: [".", ","]. This is done by passing sampling_parameters to vLLM backend as serialized JSON string input rather than the previous TRITONSERVER_Parameter approach, which didn't support Lists.

Resolves the error:

"Unsupported Value. Can\'t convert <class \'list\'> to Request Parameter"