Currently llm.chat() API only supports one conversation per inference. This means we cannot use this API to fully leverage vLLM for efficient offline processing.
Alternatives
No response
Additional context
Implementation should be rather straightforward:
at API level, llm.chat() should also accept a list of conversations.
When llm.chat() is invoked, the list of conversations will be parsed into list of prompts, and all multimodal data items will be retrieved and loaded into their corresponding format that llm.generate() accepts.
Send the list of {prompt: xxx, multi_modal_data: xxx} to the llm.generate()
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
🚀 The feature, motivation and pitch
Currently
llm.chat()
API only supports one conversation per inference. This means we cannot use this API to fully leverage vLLM for efficient offline processing.Alternatives
No response
Additional context
Implementation should be rather straightforward:
llm.chat()
should also accept a list of conversations.llm.chat()
is invoked, the list of conversations will be parsed into list of prompts, and all multimodal data items will be retrieved and loaded into their corresponding format thatllm.generate()
accepts.{prompt: xxx, multi_modal_data: xxx}
to thellm.generate()
Before submitting a new issue...