vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.16k stars 4.73k forks source link

[Feature]: Multimodel prefix-caching features #10510

Closed justzhanghong closed 1 week ago

justzhanghong commented 1 week ago

🚀 The feature, motivation and pitch

v0.6.4.post1, "--enable-prefix-caching is currently not supported for multimodal models and has been disabled." When will multimodal models be supported?

Alternatives

No response

Additional context

No response

Before submitting a new issue...

robertgshaw2-neuralmagic commented 1 week ago

This is being actively worked on as part of the VLLM V1 project. Not definite timeline but I would expect within a month

justzhanghong commented 1 week ago

Thanks.