microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.18k stars 222 forks source link

[Feature Request]: How to improve the accuracy of compressor for large SFT models through training #159

Open dingjingzhen opened 1 month ago

dingjingzhen commented 1 month ago

Is your feature request related to a problem? Please describe.

No response

Describe the solution you'd like

Is the model that works best now llmlingua-2-xlm-roberta-large-meetingbank? If I want better results, can I combine it with the sft of a large model, such as adding such a compressor in front of the sft, so that the compressed text is sent to the large model for training. The data distribution obtained by such a large model is relatively consistent with the data distribution of the compressor, so that higher accuracy can be obtained in practical reasoning. Or if I directly use the data of the large model sft to continue training the compressor, will the compressor be better?

Additional context

No response

iofu728 commented 1 month ago

Hi @dingjingzhen, thanks for your support in LLMLingua and your suggestion. The best-performing compressor currently is “llmlingua-2-xlm-roberta-large-meetingbank.” I believe your idea is promising, but implementing it will require some design and experiments. We may attempt this in the future, and if you are interested in conducting some experiments, we would be happy to collaborate.