rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
https://docs.rs/llm/latest/llm/
Apache License 2.0
6.06k stars 350 forks source link

Medusa Speculative Decoding #423

Open someone13574 opened 9 months ago

someone13574 commented 9 months ago

Recently there was a project called Medusa which was released. It basically trains more lm_head's that instead of predicting the next token, they predict the token n+2, n+3, and n+4 before generating a tree of possible combinations of top-k possibilities for the upcoming tokens and evaluating them all at once with some clever masking and selecting one of the best ones. They get ~2x speedup and it looks like they are planning to integrate into llama.cpp, so I thought it would be a good fit for this project as well.

Links: Blog, Implementation, Models

someone13574 commented 9 months ago

Ref to llama.cpp issue https://github.com/ggerganov/llama.cpp/issues/3137