microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

how can i use it in langchain? #31

Closed whm233 closed 6 months ago

whm233 commented 8 months ago

code like this:

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"]) kc = RetrievalQA.from_llm(llm=qwllm, retriever=compression_retriever, prompt=prompt)

iofu728 commented 8 months ago

Hi @whm233, thank you for your support and interest in LLMLingua. Although I'm not an expert in LangChain, based on my experience, I believe its usage in LangChain should be similar to that in LlamaIndex, i.e., operating at the Postprocessor-level or reranker-level.

I briefly reviewed the LangChain pipeline and think you'll need to extend the BaseDocumentCompressor to implement a class similar to CohereRerank, as seen here: https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/retrievers/document_compressors/cohere_rerank.py.

Afterward, use ContextualCompressionRetriever() to replace the compression_retriever.

Thank you again for your interest.

iofu728 commented 6 months ago

Thanks to @thehapyone's contribution, LLMLingua is now available in Langchain. You can follow this notebook for guidance.