vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.74k stars 4.1k forks source link

Can vLLM support medusa head? #1023

Closed MichaelJayW closed 2 months ago

MichaelJayW commented 1 year ago

https://sites.google.com/view/medusa-llm

hmellor commented 6 months ago

@simon-mo is this a feature you'd like to see implemented?

chizhang118 commented 6 months ago

Is there any plan for implementing this feature? Will it occur in Q2 roadmap?

simon-mo commented 6 months ago

Yes this is planned to happen. After the speculative decoding framework is in.