run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.56k stars 5.23k forks source link

[Bug]: 'FlagEmbeddingReranker' object has no attribute '_model' #14567

Closed Zaaccckkkk closed 4 months ago

Zaaccckkkk commented 4 months ago

Bug Description

I try to reproduce the work https://github.com/run-llama/llama_parse/blob/main/examples/demo_advanced.ipynb. I succeed once. However, after that, I failed every time. The error exists regardless of the file I use. The error message: """ python llamaparse.py Started parsing the file under job_id cac11eca-1058-4312-a42e-3203694e3962 1it [00:00, 5236.33it/s] 100% ████████████████████████████████████████████████████████████████████████████████████████████████████████ 1/1 [00:07<00:00, 7.19s/it] Disability Category Participants Ballots Completed Ballots Incomplete/Terminated Accuracy Time to complete
Blind 5 1 4 34.5%, n=1 1199 sec, n=1
Low Vision 5 2 3 98.3% n=2 1716 sec, n=3
(97.7%, n=3) (1934 sec, n=2)
Dexterity 5 4 1 98.3%, n=4 1672.1 sec, n=4
Mobility 3 3 0 95.4%, n=3 1416 sec, n=3

2 Traceback (most recent call last): File "/Users/wuzhentian/Desktop/LlamaParse_play/llamaparse.py", line 104, in response_1 = raw_query_engine.query(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py", line 52, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 189, in _query nodes = self.retrieve(query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 145, in retrieve return self._apply_node_postprocessors(nodes, query_bundle=query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 138, in _apply_node_postprocessors nodes = node_postprocessor.postprocess_nodes( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/postprocessor/types.py", line 56, in postprocess_nodes return self._postprocess_nodes(nodes, query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper result = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/postprocessor/flag_embedding_reranker/base.py", line 84, in _postprocess_nodes scores = self._model.compute_score(query_and_nodes) ^^^^^^^^^^^ AttributeError: 'FlagEmbeddingReranker' object has no attribute '_model'. Did you mean: 'model'? """

Version

0.10.40

Steps to Reproduce

requirements.txt: llama-index-llms-huggingface llama-index-embeddings-huggingface transformers accelerate bitsandbytes llama-index llama-index-core==0.10.50.post1 llama-index-postprocessor-flag-embedding-reranker git+https://github.com/FlagOpen/FlagEmbedding.git llama-parse python-dotenv llama-index-embeddings-openai

llamaparse.py: import os from dotenv import load_dotenv, find_dotenv import nest_asyncio from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core import VectorStoreIndex, Settings from llama_parse import LlamaParse from copy import deepcopy from llama_index.core.schema import TextNode from llama_index.core.node_parser import MarkdownElementNodeParser from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker from llama_index.core import SimpleDirectoryReader import pickle

Load environment variables from .env file

load_dotenv(find_dotenv()) secret_value_0 = os.getenv('LLAMACLOUD_API_KEY') secret_value_1 = os.getenv('OPENAI_API_KEY')

Apply nest_asyncio for running async code in sync environment

nest_asyncio.apply()

Initialize embedding and language models

embed_model = OpenAIEmbedding(model="text-embedding-3-small") llm = OpenAI(model="gpt-3.5-turbo-0125") Settings.llm = llm Settings.embed_model = embed_model

Load documents using LlamaParse

documents = LlamaParse(result_type="markdown", api_key=secret_value_0).load_data("/Users/wuzhentian/Desktop/example_table.pdf")

def get_page_nodes(docs, separator="\n---\n"): """Split each document into page nodes by separator.""" nodes = [] for doc in docs: doc_chunks = doc.text.split(separator) for doc_chunk in doc_chunks: node = TextNode( text=doc_chunk, metadata=deepcopy(doc.metadata), ) nodes.append(node) return nodes

Parse documents into nodes

page_nodes = get_page_nodes(documents) node_parser = MarkdownElementNodeParser( llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8 ) nodes = node_parser.get_nodes_from_documents(documents) base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

Print the content of the first node

print(page_nodes[0].get_content())

Create vector index with nodes

recursive_index = VectorStoreIndex(nodes=base_nodes + objects + page_nodes)

Cache the reranker model

reranker_filename = 'reranker_model.pkl'

def save_model(model, filename): with open(filename, 'wb') as f: pickle.dump(model, f)

def load_model(filename): with open(filename, 'rb') as f: return pickle.load(f)

Check if cached reranker model exists

if os.path.exists(reranker_filename): reranker = load_model(reranker_filename) else: reranker = FlagEmbeddingReranker( top_n=5, model="BAAI/bge-reranker-large", ) save_model(reranker, reranker_filename)

recursive_query_engine = recursive_index.as_query_engine( similarity_top_k=5, node_postprocessors=[reranker], verbose=True )

print(len(nodes))

Setup the raw query engine

file_path = "/Users/wuzhentian/Desktop/example_table.pdf" if not os.path.exists(file_path): raise ValueError(f"File {file_path} does not exist.")

reader = SimpleDirectoryReader(input_files=[file_path]) base_docs = reader.load_data() raw_index = VectorStoreIndex.from_documents(base_docs) raw_query_engine = raw_index.as_query_engine( similarity_top_k=5, node_postprocessors=[reranker] )

Now run your query

query = "How many blind participants?" response_1 = raw_query_engine.query(query) print("\nBasic Query Engine") print(response_1)

response_2 = recursive_query_engine.query(query) print("\nNew LlamaParse+ Recursive Retriever Query Engine") print(response_2)

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 4 months ago

Hello, @Zaaccckkkk! I'm here to help you with any bugs or questions you have. If you need assistance, feel free to ask.

To fix the AttributeError: 'FlagEmbeddingReranker' object has no attribute '_model', ensure that the _model attribute is properly initialized in the __init__ method of the FlagEmbeddingReranker class. Here are the steps to resolve the issue:

  1. Ensure Proper Initialization: Make sure the _model attribute is initialized correctly in the FlagEmbeddingReranker class. The initialization should look like this:

    def __init__(
        self,
        top_n: int = 2,
        model: str = "BAAI/bge-reranker-large",
        use_fp16: bool = False,
    ) -> None:
        try:
            from FlagEmbedding import FlagReranker
        except ImportError:
            raise ImportError(
                "Cannot import FlagReranker package, please install it: ",
                "pip install git+https://github.com/FlagOpen/FlagEmbedding.git",
            )
        self._model = FlagReranker(
            model,
            use_fp16=use_fp16,
        )
        super().__init__(top_n=top_n, model=model, use_fp16=use_fp16)
  2. Check the Import and Installation: Ensure that the FlagEmbedding package is correctly installed and the FlagReranker class is correctly imported. You can install the package using:

    pip install git+https://github.com/FlagOpen/FlagEmbedding.git
  3. Correct Instantiation: Verify that the FlagEmbeddingReranker object is instantiated correctly in your script. For example:

    reranker = FlagEmbeddingReranker(top_n=5, model="BAAI/bge-reranker-large")
  4. Define _model as a Private Attribute: Ensure that the _model attribute is defined as a PrivateAttr in the FlagEmbeddingReranker class:

    from llama_index.legacy.bridge.pydantic import Field, PrivateAttr
    
    class FlagEmbeddingReranker(BaseNodePostprocessor):
        """Flag Embedding Reranker."""
    
        model: str = Field(description="BAAI Reranker model name.")
        top_n: int = Field(description="Number of nodes to return sorted by score.")
        use_fp16: bool = Field(description="Whether to use fp16 for inference.")
        _model: Any = PrivateAttr()

By following these steps, you should be able to resolve the AttributeError and ensure that the _model attribute is properly initialized and accessible within the FlagEmbeddingReranker class [1][2].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 4 months ago

@Zaaccckkkk you can't pickle an embedding model 😅

Zaaccckkkk commented 4 months ago

Right, after deleting the pickle thing, it works for once. However, it fails for all further run.... Error message: " Error while parsing the file '/Users/wuzhentian/Desktop/example_table.pdf': Traceback (most recent call last): File "/Users/wuzhentian/Desktop/LlamaParse_play/llamaparse.py", line 56, in print("the page node is:", page_nodes[0].get_content())


IndexError: list index out of range
"
Zaaccckkkk commented 4 months ago

It works again after changing an Internet. This error can be caused by poor Internet connection, right?

logan-markewich commented 4 months ago

I think so, seems like there were zero page nodes