microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.48k stars 251 forks source link

[Question]: reproducing LongLLMLingua on the LongBench dataset. #127

Open junepark1 opened 6 months ago

junepark1 commented 6 months ago

Describe the issue

Thank you for your work.

I tried to reproduce LongLLMLingua on the LongBench dataset.

https://github.com/microsoft/LLMLingua/blob/main/examples/Code.ipynb. It seems to be a code for reproducing one of the longbench datasets, the repobench-p.

I have two questions.

  1. In the paper, you said you did not use the reordering strategy. But I think this ipynb code has reordering strategy. When proceeding with repoduce, may I know if I should proceed with the reordering strategy for each dataset?
  2. Can I apply the same parameters like this ipynb code for all LongBench dataset? If each dataset has a different parameter, can I know parameters for each dataset?

Thank you!

iofu728 commented 6 months ago

Hi @junepark1, apologies for the late response,

  1. Yes, you need to disable the reranker for now. We will update the results with the reranker enabled in the future.
  2. You can use the same parameters for all tasks. For other LongBench-related logic, you can refer to https://github.com/microsoft/LLMLingua/blob/main/experiments/llmlingua2/evaluation/eval_longbench.py