microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

llama_index and LLMLingua PromptCompressor inconsistency #34

Closed argenisleon closed 8 months ago

argenisleon commented 8 months ago

Hi,

When calling LongLLMLinguaPostprocessor function on the example https://github.com/microsoft/LLMLingua/blob/main/examples/RAGLlamaIndex.ipynb I got an error

TypeError: PromptCompressor.__init__() got an unexpected keyword argument 'use_auth_token'

The call got to this line https://github.com/run-llama/llama_index/blob/4c922a1ff7e2d204c13bb926e1596669a937916a/llama_index/postprocessor/longllmlingua.py#L56 to call

self._llm_lingua = PromptCompressor(
    model_name=model_name,
    device_map=device_map,
    use_auth_token=use_auth_token,
    open_api_config=open_api_config,
)

which uses use_auth_token but the function PromptCompressor on this line https://github.com/microsoft/LLMLingua/blob/bf6723c3eca3569d23c4ec367c588660dc2e65e7/llmlingua/prompt_compressor.py#L24

do not accept use_auth_token.

There seems some inconsistency between llama_index and LLMLingua Anything I could help?

iofu728 commented 8 months ago

Hi @argenisleon, I deeply appreciate your help and support.

The issue has been resolved in this pull request. You can now upgrade to llama_index version 0.9.21 to use (Long)LLMLingua in LlamaIndex.

Wishing you a pleasant holiday season!