Closed jiqing-feng closed 10 months ago
Hi @zhisbug @Viol2000
The llama model needs to cache up with HF as HF uses KV cache in the llama model.
Relate issue: https://github.com/hao-ai-lab/LookaheadDecoding/issues/35
Thanks for your efforts. I will merge the code soon.
Hi @Viol2000 . Would you please merge this PR? Thx!
Hi @zhisbug @Viol2000
The llama model needs to cache up with HF as HF uses KV cache in the llama model.
Relate issue: https://github.com/hao-ai-lab/LookaheadDecoding/issues/35