[Question]: How to utilize compression to a finetuned LLM?

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

https://llmlingua.com/

MIT License

4.42k stars 241 forks source link

[Question]: How to utilize compression to a finetuned LLM? #172

Open LiuZhihhxx opened 1 month ago

LiuZhihhxx commented 1 month ago

Describe the issue

I have an LLM finetuned for a down-stream task using input-output pairs data(X_train - Y_train). Now I plan to utilize llmlingua2 to compress X_test --> X_test_compress and evaluate the performance improvement. Is it necessary to re-fintune the LLM using a X_train_compress ? If so, it seems like to finetune the LLM to fit the compressor maybe?😂

Any advice is appreciated!

iofu728 commented 1 month ago

Hi @LiuZhihhxx, thanks for your interest in LLMLingua.

Based on your description, you have fine-tuned an LLM for a specific task. I recommend following the instructions at this link to construct the compressed dataset for these data and then fine-tune the LLMLingua-2 compressor.