microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

https://llmlingua.com/

MIT License

4.18k stars 222 forks source link

[Bug]: Calculate `n_original_tokens` Correctly in `compress_prompt_llmlingua2` #145

Closed WaelKarkoub closed 2 months ago

WaelKarkoub commented 2 months ago

What does this PR do?

Closes #144

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.