[Feature Request]: Lingua2 can discards tokens based on a probability threshold

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

MIT License

4.18k stars 222 forks source link

Is your feature request related to a problem? Please describe.

Lingua2 struggles to get a perfect compression by setting a fixed discard ratio or target length, because the perfect compression ratio which can preserve all valid tokens and discard all redundant tokens varies for different texts. I think set a probability threshold instead of setting ration or target length can solve this problem: token will be discarded if its probability of 'discard label' exceed the threshold.

Describe the solution you'd like

No response

Additional context

No response

microsoft / LLMLingua