microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.18k stars 222 forks source link

[Bug]: Calculate `n_original_tokens` Correctly in `compress_prompt_llmlingua2` #145

Closed WaelKarkoub closed 2 months ago

WaelKarkoub commented 2 months ago

What does this PR do?

Closes #144

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.