Request for Implementation Guidance on Hardware-Aware Speculative Decoding in Mamba Models

jxiw / MambaInLlama

Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

https://arxiv.org/abs/2408.15237

Apache License 2.0

127 stars 8 forks source link

Request for Implementation Guidance on Hardware-Aware Speculative Decoding in Mamba Models #3

Open adityakotha03 opened 2 weeks ago

adityakotha03 commented 2 weeks ago

Thank you for the amazing work on "The Mamba in the Llama: Distilling and Accelerating Hybrid Models." I am particularly interested in the hardware-aware speculative decoding algorithm described in the paper, especially how it can be applied to the Mamba models during inference.

I would appreciate it if you could provide a more detailed explanation or, steps to replicate the same.

jxiw commented 2 weeks ago

Sure. We will have a separate repository for that! @itsdaniele is preparing that. Maybe follow up that with @itsdaniele.

adityakotha03 commented 4 days ago

Hi @itsdaniele, would like to know if there is any update regarding speculative decoding with mamba. Thanks.