Open adityakotha03 opened 2 weeks ago
Sure. We will have a separate repository for that! @itsdaniele is preparing that. Maybe follow up that with @itsdaniele.
Hi @itsdaniele, would like to know if there is any update regarding speculative decoding with mamba. Thanks.
Thank you for the amazing work on "The Mamba in the Llama: Distilling and Accelerating Hybrid Models." I am particularly interested in the hardware-aware speculative decoding algorithm described in the paper, especially how it can be applied to the Mamba models during inference.
I would appreciate it if you could provide a more detailed explanation or, steps to replicate the same.