[Question]: Achieved compression rate with (Long)LLMLingua not meeting expectations?

Describe the issue

I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the rate parameter, not target_tokens) and came to these conclusions:

For smaller prompts (< 150 tokens) barely any compression can be achieved, if any at all
Requested compression rate is best achieved for prompts around 2000 tokens
For longer prompts (>5000 tokens) the requested rate is overshot (or undershot)

More detailed results are below. My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal? I adhered to the usage examples in README.md:

Code snippet

```python compressor = PromptCompressor( model_name="NousResearch/Llama-2-7b-hf", # or "openai-community/gpt2" device_map="balanced" ) ... def compress(prompt, rate, question=""): if longllmlingua: res = compressor.compress_prompt( [prompt], question=question, rate=rate, condition_in_question="after_condition", reorder_context="sort", dynamic_context_compression_ratio=0.3, condition_compare=True, rank_method="longllmlingua", ) else: res = compressor.compress_prompt(prompt, rate=rate) return res ```

I tested with the default Llama 2 7b aswell as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.

(Prompt lengths measured using the GPT-3.5 tokenizer)

LLMLingua with Llama 2

![Image](https://github.com/user-attachments/assets/68f6e291-088d-4c5b-a38a-19744d43faac)

LLMLingua with GPT-2

![Image](https://github.com/user-attachments/assets/875c159f-2a2f-4068-add5-0b67ce0faa2c)

LongLLMLingua with Llama 2

![Image](https://github.com/user-attachments/assets/ff9b9a56-fd76-4d58-b9e8-f3703147454f)

LongLLMLingua with GPT-2

![Image](https://github.com/user-attachments/assets/c47fde82-de09-4b67-8eb7-9a03a231e571)

In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:

LLMLingua-2

![Image](https://github.com/user-attachments/assets/2ec11aed-b05e-40d4-8f20-b41c770acd1e)

The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).

microsoft / LLMLingua

[Question]: Achieved compression rate with (Long)LLMLingua not meeting expectations? #195

Describe the issue