microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

Compatible models #83

Open oz03-hub opened 5 months ago

oz03-hub commented 5 months ago

Hello I want to know what models can be used to compress using llmlingua, I know default is NousResearch/Llama-2-7b-hf, there is also a gpt-2 alternative etc. On hugging face how do you filter which models can be used? Or is there a list so that I can experiment?

Thanks

iofu728 commented 5 months ago

Hi @oz23-hub, thank you for your interest in LLMLingua. We believe you can use any language model as a compressor in LLMLingua. The principle is that the stronger a small language model is in a specific domain, the better its performance usually is. For more details, you can refer to: https://github.com/microsoft/LLMLingua/discussions/57