microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

Some problem about code #25

Closed hhy150 closed 9 months ago

hhy150 commented 9 months ago

Very nice work! I am reading the code. May I ask if the distribution alignment section is missing from the code? Did you directly use the NousResearch/Llama-2-7b chat hf model as the compressor? Can this model be considered aligned with GPT3.5 and LongChat?

iofu728 commented 9 months ago

Hi @hhy150, in LLMLingua, we use 'tatsu-lab/alpaca', while for LongLLMLingua we utilize 'NousResearch/Llama-2-7b-chat-hf'. The alignment code follows the https://github.com/tatsu-lab/stanford_alpaca.

To ensure a fair comparison, all experiments within LongLLMLingua use the same small model. In fact, the significance distribution of perplexity represented by well-trained language models is similar, which means that even with a universally aligned model, we can achieve good performance across different black-box LLMs. Nevertheless, further alignment could still enhance performance, although the degree of improvement might be limited.

hhy150 commented 9 months ago

Get it! Thank you for your reply.