[Feature Request]: How to improve the accuracy of compressor for large SFT models through training

microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

MIT License

4.65k stars 258 forks source link

Is your feature request related to a problem? Please describe.

No response

Describe the solution you'd like

Is the model that works best now llmlingua-2-xlm-roberta-large-meetingbank? If I want better results, can I combine it with the sft of a large model, such as adding such a compressor in front of the sft, so that the compressed text is sent to the large model for training. The data distribution obtained by such a large model is relatively consistent with the data distribution of the compressor, so that higher accuracy can be obtained in practical reasoning. Or if I directly use the data of the large model sft to continue training the compressor, will the compressor be better?

Additional context

No response

microsoft / LLMLingua