microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

Question about past_key_values #18

Closed mirth closed 9 months ago

mirth commented 9 months ago

Does manipulations with past_key_values present only to increase speed? Can I remove code which operates past_key_values without lowering the performance?

iofu728 commented 9 months ago

Yes, the past_key_values parameter is used to reduce redundant computations during Iterative token-level prompt compression.

If you're directly invoking llm_lingua.get_ppl, then there's no need to use this parameter at all. If you don't want to activate the KV Cache during the compression process, you can try removing the relevant code.

I hope this answers your question.