autogen compressible agent integration

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

MIT License

4.42k stars 241 forks source link

Hello @yenif,

Thank you for your help and support. We agree that agent scenarios like AutoGen are well-suited for approaches like LLMLingua to reduce token redundancy. However, there might be new issues that need to be addressed.

The reason phi-x models cannot be directly invoked is that phi-x does not inherit from HuggingFace's AutoCausalLM to build the code https://huggingface.co/microsoft/phi-2/blob/main/modeling_phi.py#L960, which results in LLMLingua being unable to call the kv-cache. One solution could be to rebuild the phi-x code within the AutoCausalLM framework.

microsoft / LLMLingua

autogen compressible agent integration #28