huggingface / huggingface-llama-recipes

531 stars 59 forks source link

Fix: Discard KV cache for last token before reusing prompt cache for prompt + suffix #79

Open sannat17 opened 3 weeks ago

sannat17 commented 3 weeks ago

fix #78

sannat17 commented 3 weeks ago

I must note that a more robust way to handle this would have been to compare the tokens from the INITIAL_PROMPT to the first few tokens of new_inputs, and discard the cache for tokens following the first mismatched token.

However, in case of Llama tokenizers this token can only be the last token so just discarding its cache as a rule seems like an easier fix.

sannat17 commented 2 weeks ago

@ArthurZucker I'm wondering what are you thoughts on this fix since you introduced the prompt_reuse recipe