[Question]: reproducing LongLLMLingua on the LongBench dataset.

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

MIT License

4.48k stars 251 forks source link

Describe the issue

Thank you for your work.

I tried to reproduce LongLLMLingua on the LongBench dataset.

https://github.com/microsoft/LLMLingua/blob/main/examples/Code.ipynb. It seems to be a code for reproducing one of the longbench datasets, the repobench-p.

I have two questions.

In the paper, you said you did not use the reordering strategy. But I think this ipynb code has reordering strategy. When proceeding with repoduce, may I know if I should proceed with the reordering strategy for each dataset?
Can I apply the same parameters like this ipynb code for all LongBench dataset? If each dataset has a different parameter, can I know parameters for each dataset?

Thank you!

microsoft / LLMLingua

[Question]: reproducing LongLLMLingua on the LongBench dataset. #127

Describe the issue