Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

22.08k stars 3.11k forks source link

Open tchaton opened 3 months ago

tchaton commented 3 months ago

rkooo567 commented 3 months ago

cc @cadedaniel