Can vLLM support medusa head?

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

27.74k stars 4.1k forks source link

Closed MichaelJayW closed 2 months ago

MichaelJayW commented 1 year ago

hmellor commented 6 months ago

@simon-mo is this a feature you'd like to see implemented?

chizhang118 commented 6 months ago

Is there any plan for implementing this feature? Will it occur in Q2 roadmap?

simon-mo commented 6 months ago

Yes this is planned to happen. After the speculative decoding framework is in.