[Feature]: Multimodel prefix-caching features

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

31.16k stars 4.73k forks source link

[Feature]: Multimodel prefix-caching features #10510

Closed justzhanghong closed 1 week ago

justzhanghong commented 1 week ago

🚀 The feature, motivation and pitch

v0.6.4.post1, "--enable-prefix-caching is currently not supported for multimodal models and has been disabled." When will multimodal models be supported?

Alternatives

No response

Additional context

No response

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

robertgshaw2-neuralmagic commented 1 week ago

This is being actively worked on as part of the VLLM V1 project. Not definite timeline but I would expect within a month

justzhanghong commented 1 week ago

Thanks.