microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.48k stars 251 forks source link

[Bug]: structured_compress_prompt not working correctly with LLMLingua2 #114

Open soumyaamazon opened 6 months ago

soumyaamazon commented 6 months ago

Describe the bug

Model:

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor(
    model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
    use_llmlingua2=True, # Whether to use llmlingua-2
)

Prompt:

prompt = """<llmlingua, compress=False> Read the context and answer the question that follows. Don\'t give information outside the document or repeat your findings. Context: </llmlingua><llmlingua, rate=0.5> Context: October 2004\nAs E. B. White said, "good writing is rewriting."  I didn\'t\nrealize this when I was in school.  In writing, as in math and \nscience, they only show you the finished product.\nYou don\'t see all the false starts.  This gives students a\nmisleading view of how things get made.Part of the reason it happens is that writers
don\'t want   \npeople to see their mistakes.  But I\'m willing to let people\nsee an early draft if it will show how much you have\nto rewrite to beat an essay into shape.Below is the oldest version I can find of\nThe Age of the Essay   \n(probably the second or third day), with\ntext that ultimately survived in \nred and text that later\ngot deleted in gray.\nThere seem to be several categories of cuts: things I got wrong,\nthings that seem like bragging, flames,\ndigressions, stretches of awkward prose, and unnecessary words.I discarded more from the beginning.  That\'s\nnot surprising; it takes a while to hit your stride.
</llmlingua><llmlingua, compress=False> Question: What is the best thing to do in San Francisco? Answer: </llmlingua>"""

Code:

compressed_prompt = llm_lingua.structured_compress_prompt(prompt)

Output:

{'compressed_prompt': 'Read context answer question follows. Don\'t give information outside document or repeat findings. Context: October 2004 E. B. White said "good writing is rewriting." didn\'t realize in school. In writing as in math and science only show finished product don\'t see false starts. gives students misleading view of how things made.Part writers don\'t want people to see mistakes willing to let people see early draft if show much to rewrite to beat essay into shape.Below oldest version of The Age of the Essay (probably second or third day), with text survived in red text later deleted in gray. several categories of cuts: things wrong, things bragging, flames digressions awkward prose unnecessary words.I discarded more from beginning. not surprising; takes a while to hit stride. Question: best thing to do in San Francisco? Answer:', 'compressed_prompt_list': ['Read context answer question follows. Don\'t give information outside document or repeat findings. Context: October 2004 E. B. White said "good writing is rewriting." didn\'t realize in school. In writing as in math and science only show finished product don\'t see false starts. gives students misleading view of how things made.Part writers don\'t want people to see mistakes willing to let people see early draft if show much to rewrite to beat essay into shape.Below oldest version of The Age of the Essay (probably second or third day), with text survived in red text later deleted in gray. several categories of cuts: things wrong, things bragging, flames digressions awkward prose unnecessary words.I discarded more from beginning. not surprising; takes a while to hit stride. Question: best thing to do in San Francisco? Answer:'], 'origin_tokens': 250, 'compressed_tokens': 167, 'ratio': '1.5x', 'rate': '66.8%', 'saving': ', Saving $0.0 in GPT-4.'}

Steps to reproduce

Mentioned above.

Expected Behavior

The part which is wrapped in compress=False tag shouldn't have been compressed.

Logs

No response

Additional Information

No response

SiyunZhao commented 6 months ago

Hi @soumyaamazon, thank you for your support and the detailed issue information.

This issue arises because the current structured_compress_prompt temporarily does not support the llmlingua2 series of models. If structured_compress_prompt is needed, it is recommended to use the llmlingua or longllmlingua series for now. We will update the support for the lingua2 series in the future.

soumyaamazon commented 6 months ago

Thank you.

soumyaamazon commented 6 months ago

Hi @SiyunZhao, wanted to confirm if the following usage can be an alternative of above i.e. Does LLMLingua 2 drop tokens using information from the instruction and question?

    llm_lingua = PromptCompressor(
        model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
        use_llmlingua2=True
    )
    compressed_prompt = llm_lingua.compress_prompt(context, instruction=prefix, question=suffix, rate=0.5, force_tokens=['\n', '?'])
    input_text = prefix + compressed_prompt['compressed_prompt'] + suffix
msclar commented 5 months ago

Hi @soumyaamazon, thank you for your support and the detailed issue information.

This issue arises because the current structured_compress_prompt temporarily does not support the llmlingua2 series of models. If structured_compress_prompt is needed, it is recommended to use the llmlingua or longllmlingua series for now. We will update the support for the lingua2 series in the future.

What models would you recommend? I'm seeing this issue with the default model in PromptCompressor() ('NousResearch/Llama-2-7b-hf'), as well as phi-2.

Specifically, I want to only compress the last field, and keep the few-shot intact, but the model will compress part of the prompt that I asked not to be compressed.

structured_prompt = """<llmlingua, compress=False>QUESTION: The Peach belongs to a group of seedless fruit. True or false?
REASONING: Peaches have one large seed surrounded by the flesh of the fruit.
ANSWER: False.

QUESTION: The band Lynyrd Skynyrd formed up in Beijing, China. True or false?
REASONING: They formed in Jacksonville, Florida.
ANSWER: False.

QUESTION: Only people named Floyd wearing pink are allowed to attend Pink Floyd concerts. True or false?
REASONING:</llmlingua><llmlingua, rate=0.2> The rock group would not be as popular if they had such requirements for their concerts.</llmlingua>"""
compressed_prompt = llm_lingua.structured_compress_prompt(structured_prompt, instruction="", question="", rate=0.2)

The result is:

{'compressed_prompt': 'QUESTION: The Peach belongs to a group of seedless fruit. True or false?\nREASONING: Peaches have one large seed surrounded by the flesh of the fruit.\nANSWER: False.\n\nQUESTION: The band Lynyrd Skynyrd formed up in Beijing, China. True or false?\nREASONINGQUESTION: Only people named Floyd wearing pink are allowed to attend Pink Floyd concerts. True or false?\nREASONING: The rock group would not be as popular if they had such requirements for their concerts.', 'origin_tokens': 118, 'compressed_tokens': 105, 'ratio': '1.1x', 'rate': '89.0%', 'saving': ', Saving $0.0 in GPT-4.'}

which has removed the second shot example.

Thanks in advance!