Open sannat17 opened 3 weeks ago
I must note that a more robust way to handle this would have been to compare the tokens from the INITIAL_PROMPT to the first few tokens of new_inputs, and discard the cache for tokens following the first mismatched token.
However, in case of Llama tokenizers this token can only be the last token so just discarding its cache as a rule seems like an easier fix.
@ArthurZucker I'm wondering what are you thoughts on this fix since you introduced the prompt_reuse
recipe
fix #78