How to reproduce Multidocument QA results under 9th？

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

https://llmlingua.com/

MIT License

4.27k stars 228 forks source link

How to reproduce Multidocument QA results under 9th？ #86

Open Twilightaaa opened 5 months ago

Twilightaaa commented 5 months ago

My reproduction of the results on location 9 of the NQ dataset in the longllmlingua paper using the prompt compressor resulted in a large discrepancy from the original results. My hyperparameters are set as follows：

The args.t was set to True and False in two experiments, which was to verify the validity of the contrast ITC.When args.t is set to True, accuracy is 63, while when args.t is set to False, the accuracy is 69.

Questions: 1.What are the hyperparameters that can accurately reproduce the results in the paper with an accuracy of approximately 70.8%(NQ 2x 9th)? 2.Why does contrast ITC drop so severely under my current settings?

Twilightaaa commented 5 months ago

The args.ratio is set to 0.5

iofu728 commented 5 months ago

Just checked the script with @Twillghtaaa and found the main issue lies in the call mode of LLMs, with parameters largely consistent with those mentioned earlier.

Experiments in LLMLingua and most experiments in LongLLMLingua were conducted in completion mode, whereas chat mode tends to be more sensitive to token-level compression. However, OpenAI has currently disabled GPT-3.5-turbo's completion; you can use GPT-3.5-turbo-instruction or Azure OpenAI service instead.

yfpeng1234 commented 4 months ago

Hi, @Twilightaaa ! Could you please share your reproduction script of experiments on location 9 of the NQ dataset? Thanks!

iofu728 commented 4 months ago

Hi @yfpeng1234, you can follow the https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb and use "GPT-3.5-turbo-instruction" model.

Twilightaaa commented 4 months ago

Hi, @yfpeng1234! I followed the instructions provided in https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb and utilized the "GPT-3.5-turbo-instruction" model without any additional modifications or adjustments.