microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.48k stars 251 forks source link

Troubleshooting Issues in LlamaIndex RAG Demo after Updating to Version 0.10 #153

Closed 190679163 closed 4 months ago

190679163 commented 4 months ago

Describe the bug

The llamaindex RAG demo is no longer functioning properly due to significant changes in library calls after updating llamaindex to version 0.10. Could you help me troubleshoot where the problem might be? Thank you.

Steps to reproduce

Install

!pip install llmlingua llama-index llama-index-embeddings-huggingface llama-index-embeddings-instructor llama-index-llms-openai llama-index-llms-openai-like llama-index-readers-file pymupdf llama-index-retrievers-bm25 transformers llama_hub

!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

import

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.core import Settings from llama_index.llms.openai_like import OpenAILike

Setup LLMLingua

from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.response_synthesizers import CompactAndRefine from llama_index.legacy.postprocessor.longllmlingua import * from llama_index.core import QueryBundle from llama_index.llms.openai import OpenAI import os import openai

Embedding

load documents

documents = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"]).load_data()

Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" )

index = VectorStoreIndex.from_documents(documents)

question = "What did the author do growing up?"

question = "What did the author do during his time in YC?"

question = "Where did the author go for art school?"

retriever = index.as_retriever(similarity_top_k=10)

retriever = index.as_retriever(similarity_top_k=10)

Ground-truth Answer

answer = "RISD"

contexts = retriever.retrieve(question)

context_list = [n.get_content() for n in contexts] len(context_list)

llm = OpenAILike(model= "gpt-3.5-turbo", api_base= "https://api.??????.com.cn/v1", api_key= "sk-***", is_chat_model=True) llm2 = OpenAILike(model= "gpt-3.5-turbo-0125", api_base= "https://api.?????.com.cn/v1", api_key= "sk-**", is_chat_model=True)

prompt = "\n\n".join(context_list + [question]) response = llm.complete(prompt) print(str(response))

lingua

node_postprocessor = LongLLMLinguaPostprocessor( instruction_str="Given the context, please answer the final question", target_token=400, rank_method="longllmlingua", additional_compress_kwargs={ "condition_compare": True, "condition_in_question": "after", "context_budget": "+100", "reorder_context": "sort", # enable document reorder, "dynamic_context_compression_ratio": 0.3, }, ) Settings.llm = llm2 retrieved_nodes = retriever.retrieve(question) synthesizer = CompactAndRefine()

outline steps in RetrieverQueryEngine for clarity:

postprocess (compress), synthesize

new_retrieved_nodes = node_postprocessor.postprocess_nodes( retrieved_nodes, query_bundle=QueryBundle(query_str=question) )

original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes]) compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])

original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts) compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

print(compressed_contexts) print() print("Original Tokens:", original_tokens) print("Compressed Tokens:", compressed_tokens) print("Compressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")

Go wrong here (maybe is the error is because of llamaindex)

response = synthesizer.synthesize(question, new_retrieved_nodes)

retriever_query_engine = RetrieverQueryEngine.from_args( retriever, node_postprocessors=[node_postprocessor] )

response = retriever_query_engine.query(question)

Expected Behavior

According to expectations, the comparison results of the two methods should be output correctly.

Logs


ValidationError Traceback (most recent call last)

in () ----> 1 response = synthesizer.synthesize(question, new_retrieved_nodes) /usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs) 272 ) 273 try: --> 274 result = func(*args, **kwargs) 275 except BaseException as e: 276 self.event(SpanDropEvent(span_id=id_, err_str=str(e))) /usr/local/lib/python3.10/dist-packages/llama_index/core/response_synthesizers/base.py in synthesize(self, query, nodes, additional_source_nodes, **response_kwargs) 256 257 dispatch_event( --> 258 SynthesizeEndEvent( 259 query=query, 260 response=response, /usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in __init__(__pydantic_self__, **data) 339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) 340 if validation_error: --> 341 raise validation_error 342 try: 343 object_setattr(__pydantic_self__, '__dict__', values) ValidationError: 6 validation errors for SynthesizeEndEvent response -> source_nodes -> 0 -> node Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error) response -> source_nodes -> 1 -> node Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error) response -> source_nodes -> 2 -> node Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error) response instance of StreamingResponse, tuple or dict expected (type=type_error.dataclass; class_name=StreamingResponse) response instance of AsyncStreamingResponse, tuple or dict expected (type=type_error.dataclass; class_name=AsyncStreamingResponse) response instance of PydanticResponse, tuple or dict expected (type=type_error.dataclass; class_name=PydanticResponse) ### Additional Information Run on Colab L4