microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

No improvemence when apply LongLLMLingua after retrieval. #39

Closed ZhexuanZhou closed 8 months ago

ZhexuanZhou commented 8 months ago

In my situation, I have a retrieved list, and each item in the list contains a positive context and 19 negative contexts. After obtaining the list, I wanna use LongLLMLingua for reranking. However, I didn't see any improvements, which means that the MRR@n and recall@n remain the same. Could you give some advice to improve the reranking performance?

iofu728 commented 8 months ago

Hi @ZhexuanZhou,

Thank you for your support and interest in the LongLLMLingua project.

LongLLMLingua addresses the issue of overconfidence that may arise when using LLMs as retrieval or rerank models. It proposes a method of using a restrictive prompt for guidance and mitigation. I suggest adjusting the corresponding restrictive sentence according to the specific task, as demonstrated here: LongLLMLingua Prompt Compressor.

Additionally, you may want to experiment with different small LMs, such as Mistral.

However, I am curious if the recall@1/5 after reranking is also identical to the original retrieval results.

ZhexuanZhou commented 8 months ago

@iofu728 Thank you for your reply.

  1. For your interest, the "recall@1/5" after ranking remains the same as well.

  2. By adjusting the corresponding restrictive sentence, do you mean using one of "none", "before" and "after" for the parameter named "condition_in_question"? if so, the best performance we obtained is setting condition_in_question as "after" and rank_method as "longllmlingua".

  3. I am confused with the distribution alignment step. For example, the input data X is {x_instruction, x_demonstration, x_query}, then we take the X as input for the target LLM to select the "important" tokens, let's say X_token. After that, we take X as input and the X_token as the target to fine-tune the small LLM. So, my questions are,

    • the distribution alignment does not affect the reranking result, does this mean we can use any small LLM as a reranker?
    • is it necessary to fine-tune the small LLM when we switch to different target LLMs?
iofu728 commented 8 months ago
  1. Thank you for the information. However, I still find it unusual that the recall@n remains unchanged, implying that the LongLLMLingua rank is completely aligned with the retrieval rank.

  2. Apologies for the confusion, my reference was to the restrictive prompt at #L1134. You might want to try using different restrictive prompts. For 'condition_in_question,' I recommend using 'after'.

  3. a) Our current understanding is that any LM can be used for estimating the importance distribution of tokens. However, we believe that the higher the compression ratio of the LM itself (LM as Compressor), the more accurate the estimation will be, especially in terms of exposure to more tokens during the pre-training process.

    b) There might be some impact, but it is minimal – about 1-2 points in our previous experiments.

ZhexuanZhou commented 8 months ago

I appreciate your time, thank you.