fix llama kv cache - Githubissues

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

https://arxiv.org/abs/2402.02057

Apache License 2.0

1.15k stars 67 forks source link

Closed jiqing-feng closed 10 months ago

jiqing-feng commented 11 months ago

Hi @zhisbug @Viol2000

The llama model needs to cache up with HF as HF uses KV cache in the llama model.

Viol2000 commented 11 months ago

Thanks for your efforts. I will merge the code soon.

jiqing-feng commented 10 months ago

Hi @Viol2000 . Would you please merge this PR? Thx!