vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.03k stars 4.72k forks source link

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding #3398

Closed tchaton closed 1 day ago

tchaton commented 8 months ago

This paper might be of interest: https://arxiv.org/pdf/2402.12374.pdf

rkooo567 commented 8 months ago

cc @cadedaniel

cadedaniel commented 4 months ago

Thanks for creating the issue. For us to implement Sequoia we need both https://github.com/vllm-project/vllm/issues/3960 and https://github.com/vllm-project/vllm/issues/4565.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] commented 1 day ago

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!