To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
In the paper, you said you did not use the reordering strategy. But I think this ipynb code has reordering strategy.
When proceeding with repoduce, may I know if I should proceed with the reordering strategy for each dataset?
Can I apply the same parameters like this ipynb code for all LongBench dataset?
If each dataset has a different parameter, can I know parameters for each dataset?
Describe the issue
Thank you for your work.
I tried to reproduce LongLLMLingua on the LongBench dataset.
https://github.com/microsoft/LLMLingua/blob/main/examples/Code.ipynb. It seems to be a code for reproducing one of the longbench datasets, the repobench-p.
I have two questions.
Thank you!