[Question]: How to only compress documents in the RAG setting?

Describe the issue

Hi, Teams! Thanks so much for this inspiring work! I want to confirm if this is the correct way to only compress documents in the RAG setting:

from llmlingua import PromptCompressor
from nltk.tokenize import sent_tokenize

document = """
    1972 |  program: Eugene Cernan is the last person to walk on the Moon, after he and Harrison Schmitt complete the third and final Extra-vehicular activity (EVA) of Apollo 17. \
    This is currently the last manned mission to the Moon. ; December 15 ; The Commonwealth of Australia ordains equal pay for women. ; The United Nations Environment Programme is \
    established as a specialized agency of the United Nations. ; December 16 ; The Constitution of Bangladesh comes into effect. ; The Portuguese army kills 400 Africans in Tete, Mozambique. \
    ; December 19 &ndash; Apollo program: Apollo 17 returns to Earth, concluding the program of lunar exploration. ; December 21
"""
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(sent_tokenize(document), instruction="", question="", target_token = 10,)

Should I use sent_tokenize to get List[str]? How to precisely control the size of compressed documents? In the example above, I got 19 tokens even if I set target_token=10.

Look forward to your response!

microsoft / LLMLingua

[Question]: How to only compress documents in the RAG setting? #105

Describe the issue