microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

[Question]: How to only compress documents in the RAG setting? #105

Closed Hannibal046 closed 4 months ago

Hannibal046 commented 4 months ago

Describe the issue

Hi, Teams! Thanks so much for this inspiring work! I want to confirm if this is the correct way to only compress documents in the RAG setting:

from llmlingua import PromptCompressor
from nltk.tokenize import sent_tokenize

document = """
    1972 |  program: Eugene Cernan is the last person to walk on the Moon, after he and Harrison Schmitt complete the third and final Extra-vehicular activity (EVA) of Apollo 17. \
    This is currently the last manned mission to the Moon. ; December 15 ; The Commonwealth of Australia ordains equal pay for women. ; The United Nations Environment Programme is \
    established as a specialized agency of the United Nations. ; December 16 ; The Constitution of Bangladesh comes into effect. ; The Portuguese army kills 400 Africans in Tete, Mozambique. \
    ; December 19 – Apollo program: Apollo 17 returns to Earth, concluding the program of lunar exploration. ; December 21
"""
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(sent_tokenize(document), instruction="", question="", target_token = 10,)

Should I use sent_tokenize to get List[str]? How to precisely control the size of compressed documents? In the example above, I got 19 tokens even if I set target_token=10.

Look forward to your response!

iofu728 commented 4 months ago

Hi @Hannibal046, thank you for your interest in LLMLingua.

  1. The best practice for dividing the context is to do so semantically, such as at the document level; however, using “sent_tokenize(document)” is also viable.
  2. Setting the compression target to 10 will be quite challenging. Additionally, our current method cannot guarantee complete adherence to the compression target, but aims to approach it as closely as possible. You can adjust the coefficients to get closer to your target.

For more principles, you can refer to this document.