microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.63k stars 254 forks source link

[Question]: Reproduction of Big Bench Hard with LLMLingua-2 #191

Closed cornzz closed 1 day ago

cornzz commented 4 weeks ago

Describe the issue

As there is no example for compression of BBH prompts given in the experiments folder I wanted to ask, which parameters were used for the compression of the CoT prompts used for big bench hard? Specifically, which values for the params

  1. compression_rate / target_token
  2. force_tokens
  3. force_reserve_digit

were used, and was use_context_level_filter set to true? Furthermore, there are 3 CoT example prompts for each task, how was the compression done, were they passed as a single string or as a list, with each CoT prompt as a separate item, as done for the GSM8K CoT prompts here?

iofu728 commented 3 weeks ago

Hi @pzs19, could you help to answer this question?

pzs19 commented 3 weeks ago

Hi @cornzz, thanks for your feedback.

The parameters used for the compression of BIG-Bench-Hard CoT prompts are:

  1. "target_token" is described in "Tokens" column of the Table 3 in our paper.
  2. "force_tokens" is set to "\n,!,?,.,Q:,A:,So the answer is".
  3. "force_reserve_digit" is False.
  4. "use_context_level_filter" is True, and "context_level_target_token" is set to twice of "target_token", which reserves approximately 2 or 1 of the cot examples. The 3 CoT example prompts are passed in as a list, so that the use_context_level_filter can take effect.
cornzz commented 3 weeks ago

Thank you!

cornzz commented 2 weeks ago

@pzs19 Sorry, I do have a follow-up question regarding your response:

1. "target_token" is described in "Tokens" column of the Table 3 in our [paper](https://aclanthology.org/2024.findings-acl.57.pdf).

I was of the impression that the column "Tokens" was the actual achieved token count, not the target_token parameter? Otherwise it should be the same value for each compression method under 1-shot constraint and half-shot constraint, respectively, but in reality its a different value for each compression method? Additionally, when I set the target_token parameter to, say, 269 (value in the "Tokens" column for 1-shot constraint for LLMLingua-2) the actual achieved compressed token count does not actually correspond to that token count but is a lot lower. To actually achieve a token count around 269 I have to set target_token closer to 300. Also, when I use ratio instead of target_token and set it to 0.33 (to achieve 3x compression as in table 3) the actual achieved ratio is even higher, over 4x. Hence my confusion on how to reproduce these results, as I still do not understand what "1-/half-shot constraint" means and how to derive compression targets from that.

pzs19 commented 2 days ago

Yes, the column "Tokens" is the actual achieved token count, which may slightly varies from the "target_token" you set. Your observation is totally reasonable. So you may set the "target_token" slightly higher than the exact reserved token you want.

pzs19 commented 2 days ago

The meaning of "1-/half-shot constraint" is here.

cornzz commented 1 day ago

Thank you!