run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.21k stars 5.16k forks source link

[Bug]: Troubleshooting Issues in LongLLMlingua RAG Demo after Updating to Version 0.10 #13617

Closed 190679163 closed 5 months ago

190679163 commented 5 months ago

Bug Description

The llamaindex RAG demo is no longer functioning properly due to significant changes in library calls after updating llamaindex to version 0.10. Could you help me troubleshoot where the problem might be? Thank you. image


# Install dependency.
!pip install llmlingua llama-index llama-index-embeddings-huggingface llama-index-embeddings-instructor llama-index-llms-openai llama-index-llms-openai-like llama-index-readers-file pymupdf llama-index-retrievers-bm25 transformer

!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai_like import OpenAILike
# Setup LLMLingua
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.legacy.postprocessor.longllmlingua import *
from llama_index.core import QueryBundle
from llama_index.llms.openai import OpenAI
import os
import openai

# load documents
documents = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"]).load_data()

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

index = VectorStoreIndex.from_documents(documents)

# question = "What did the author do growing up?"
# question = "What did the author do during his time in YC?"
question = "Where did the author go for art school?"

retriever = index.as_retriever(similarity_top_k=10)

retriever = index.as_retriever(similarity_top_k=10)

# Ground-truth Answer
answer = "RISD"

contexts = retriever.retrieve(question)

context_list = [n.get_content() for n in contexts]
len(context_list)
OPENAI_BASE_API = "https://openai like api/v1"
OPENAI_API_KEY = "sk-your key"

openai.api_base = "https://openai like ai/v1"
openai.api_key = "sk-your key"

llm = OpenAILike(model= "gpt-3.5-turbo-0125", api_base= "https://openai like ai/v1", api_key= "sk-your key", is_chat_model=True)
llm2 = OpenAILike(model= "gpt-3.5-turbo-instruct", api_base= "https://openai like ai/v1", api_key= "sk-your key")

prompt = "\n\n".join(context_list + [question])
response = llm.complete(prompt)
print(str(response))

node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=400,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder,
        "dynamic_context_compression_ratio": 0.3,
    },
)
Settings.llm = llm2
retrieved_nodes = retriever.retrieve(question)
synthesizer = CompactAndRefine()

# outline steps in RetrieverQueryEngine for clarity:
# postprocess (compress), synthesize
new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
)

original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes])
compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])

original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)
compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

print(compressed_contexts)
print()
print("Original Tokens:", original_tokens)
print("Compressed Tokens:", compressed_tokens)
print("Compressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")

The code above is fine, errors start below.


response = synthesizer.synthesize(question, new_retrieved_nodes)
print(str(response))

retriever_query_engine = RetrieverQueryEngine.from_args(
    retriever, node_postprocessors=[node_postprocessor]
)

response = retriever_query_engine.query(question)

Version

stable

Steps to Reproduce

It runs fine until the last part where it throws an error. I'm running it on Colab.

Relevant Logs/Tracbacks

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-9-beee427193a1> in <cell line: 5>()
      3 )
      4 
----> 5 response = retriever_query_engine.query(question)

6 frames 6 帧
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in __init__(__pydantic_self__, **data)
    339         values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340         if validation_error:
--> 341             raise validation_error
    342         try:
    343             object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 6 validation errors for SynthesizeEndEvent
response -> source_nodes -> 0 -> node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
response -> source_nodes -> 1 -> node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
response -> source_nodes -> 2 -> node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
response
  instance of StreamingResponse, tuple or dict expected (type=type_error.dataclass; class_name=StreamingResponse)
response
  instance of AsyncStreamingResponse, tuple or dict expected (type=type_error.dataclass; class_name=AsyncStreamingResponse)
response
  instance of PydanticResponse, tuple or dict expected (type=type_error.dataclass; class_name=PydanticResponse)
dosubot[bot] commented 5 months ago

The ValidationError you're encountering suggests that the BaseNode class is being instantiated directly, which is not allowed because it contains abstract methods that need to be implemented by a subclass. Additionally, the synthesize function and the query method in the RetrieverQueryEngine class now expect specific response types.

Here are the steps to resolve the issue:

  1. Ensure Concrete Implementation of BaseNode: Make sure you are using a concrete subclass of BaseNode that implements all the required methods.

  2. Update the synthesize Function: Ensure that the synthesize function returns the correct response types, such as NodeWithScore.

  3. Update the RetrieverQueryEngine Usage: Ensure that the RetrieverQueryEngine is correctly instantiated and used.

Here is the updated code snippet with the necessary changes:

# Install dependency.
!pip install llmlingua llama-index llama-index-embeddings-huggingface llama-index-embeddings-instructor llama-index-llms-openai llama-index-llms-openai-like llama-index-readers-file pymupdf llama-index-retrievers-bm25 transformer

!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai_like import OpenAILike
# Setup LLMLingua
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.legacy.postprocessor.longllmlingua import *
from llama_index.core import QueryBundle
from llama_index.llms.openai import OpenAI
import os
import openai

# load documents
documents = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"]).load_data()

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

index = VectorStoreIndex.from_documents(documents)

# question = "What did the author do growing up?"
# question = "What did the author do during his time in YC?"
question = "Where did the author go for art school?"

retriever = index.as_retriever(similarity_top_k=10)

retriever = index.as_retriever(similarity_top_k=10)

# Ground-truth Answer
answer = "RISD"

contexts = retriever.retrieve(question)

context_list = [n.get_content() for n in contexts]
len(context_list)
OPENAI_BASE_API = "https://openai like api/v1"
OPENAI_API_KEY = "sk-your key"

openai.api_base = "https://openai like ai/v1"
openai.api_key = "sk-your key"

llm = OpenAILike(model= "gpt-3.5-turbo-0125", api_base= "https://openai like ai/v1", api_key= "sk-your key", is_chat_model=True)
llm2 = OpenAILike(model= "gpt-3.5-turbo-instruct", api_base= "https://openai like ai/v1", api_key= "sk-your key")

prompt = "\n\n".join(context_list + [question])
response = llm.complete(prompt)
print(str(response))

node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=400,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder,
        "dynamic_context_compression_ratio": 0.3,
    },
)
Settings.llm = llm2
retrieved_nodes = retriever.retrieve(question)
synthesizer = CompactAndRefine()

# outline steps in RetrieverQueryEngine for clarity:
# postprocess (compress), synthesize
new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
)

original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes])
compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])

original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)
compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

print(compressed_contexts)
print()
print("Original Tokens:", original_tokens)
print("Compressed Tokens:", compressed_tokens)
print("Compressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")

# Ensure the synthesizer returns the correct type
from llama_index.core.schema import NodeWithScore, Node

# Example of creating a NodeWithScore
node = Node(text="example text")
node_with_score = NodeWithScore(node=node, score=1.0)

response = synthesizer.synthesize(question, [node_with_score])
print(str(response))

retriever_query_engine = RetrieverQueryEngine.from_args(
    retriever, node_postprocessors=[node_postprocessor]
)

query_bundle = QueryBundle(query_str=question)
response = retriever_query_engine.query(query_bundle)
print(str(response))

This should resolve the ValidationError by ensuring that the correct types are used and that the BaseNode class is not instantiated directly.

#### Details **Sources** - [llama-index-core/llama_index/core/schema.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/schema.py) - [CHANGELOG.md](https://github.com/run-llama/llama_index/blob/main/CHANGELOG.md) - [docs/docs/module_guides/querying/response_synthesizers/index.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/querying/response_synthesizers/index.md) - [llama-index-core/llama_index/core/query_engine/retriever_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/retriever_query_engine.py) - [llama-index-legacy/llama_index/legacy/query_engine/retriever_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/query_engine/retriever_query_engine.py) - [llama-index-core/llama_index/core/query_engine/router_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/router_query_engine.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 5 months ago

You can't mix legacy and non-legacy imports

Either they are all legacy, or none are legacy.

To remove the one legacy import

pip install llama-index-postprocessor-longllmlingua

from llama_index.postprocessor.longllmlingua import ...