triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
520 stars 225 forks source link

Patch vLLM for missing content entry #671

Closed IzzyPutterman closed 1 month ago

IzzyPutterman commented 1 month ago

vLLM does not include content in the first response from the server. Now we filter for the content entry missing before preprocessing merged requests.