microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.18k stars 222 forks source link

[Feature Request]: Lingua2 can discards tokens based on a probability threshold #150

Open Meguminnnnnnnn opened 2 months ago

Meguminnnnnnnn commented 2 months ago

Is your feature request related to a problem? Please describe.

Lingua2 struggles to get a perfect compression by setting a fixed discard ratio or target length, because the perfect compression ratio which can preserve all valid tokens and discard all redundant tokens varies for different texts. I think set a probability threshold instead of setting ration or target length can solve this problem: token will be discarded if its probability of 'discard label' exceed the threshold.

Describe the solution you'd like

No response

Additional context

No response

iofu728 commented 2 months ago

Hi @Meguminnnnnnnn, thank you for your suggestion. We will enhance the related features in future iterations.