microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

The specific parameter settings in the compressor for reproduce NQ #22

Open ignorejjj opened 9 months ago

ignorejjj commented 9 months ago

Very nice work! I am trying to replicate the results of longllmlingua on a Natural Questions dataset, but there may be some discrepancies between the results and those in the paper due to unclear values that should be set for each parameter of the compressor. So I would like to inquire the specific parameter settings in the compressor?

iofu728 commented 9 months ago

Hi @ignorejjj, thank you for your support for our work.

No problem, I will post the parameters we used in the experiment below for reference. We will also release the relevant code after the review,

The 2x compression ratio uses,

    res = []
    with xopen(path) as f:
        for ii, jj in tqdm(enumerate(f), total=2655):
            if ii < len(res):
                continue
            input_example = json.loads(jj)
            question = input_example["question"]
            documents = []
            for ctx in deepcopy(input_example["ctxs"]):
                documents.append(Document.from_dict(ctx))

            prompt = get_qa_prompt(
                question,
                documents,
                mention_random_ordering=False,
                query_aware_contextualization=False,
            )

            c = prompt.split("\n\n")
            instruction, question = c[0], c[-1]
            demonstration = "\n".join(c[1:-1])

            compressed_prompt = llm_lingua.compress_prompt(demonstration.split("\n"), instruction, question, 0.55, use_sentence_level_filter=False, condition_in_question="after_condition", reorder_context="sort", dynamic_context_compression_ratio=0.3, condition_compare=True, context_budget="+100", token_budget_ratio=1.05, rank_method="longllmlingua")
            res.append({"id": ii, "prompt": compressed_prompt, "answer": input_example["answers"]})

    json.dump(res, open(f"prompt/loss_in_middle/ours_{doc_num}_{idx}_2x_dem_after_add_prompt1_dy03dem_sort.json", "w"))

The 4x compression ratio uses,

compressed_prompt = llm_lingua.compress_prompt(demonstration.split("\n"), instruction, question, 0.75, use_sentence_level_filter=False, condition_in_question="after_condition", reorder_context="sort", dynamic_context_compression_ratio=0.4, condition_compare=True, context_budget="*1.2", token_budget_ratio=1.05, rank_method="longllmlingua")

If you have more questions, feel free to reply and discuss.

ignorejjj commented 9 months ago

thank for your quick reply!

zhyunlong commented 8 months ago

Very appreciate your awesome work. Could you please provide the code for evaluation, including batched inference?

iofu728 commented 8 months ago

Very appreciate your awesome work. Could you please provide the code for evaluation, including batched inference?

Hi @zhyunlong, thank you for your support with LLMLingua.

We utilize the same script as 'lost in the middle'. You can access the script at this link.