sgl-project / sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.75k stars 177 forks source link

Fix tp worker only checking req[0] for stream #546

Closed Qubitium closed 2 weeks ago

Qubitium commented 2 weeks ago

One condition of early exiting the decoding batch (fixed 10 size) loop is to check if batch contains streaming requests.

Fix the current code in tp_worker which only checks the first request self.running_batch.reqs[0].stream and use a new Batch.has_stream(self) -> bool helper function to check if any batch.req has stream == true.

TESTS: