Closed Zaaccckkkk closed 4 months ago
Hello, @Zaaccckkkk! I'm here to help you with any bugs or questions you have. If you need assistance, feel free to ask.
To fix the AttributeError: 'FlagEmbeddingReranker' object has no attribute '_model'
, ensure that the _model
attribute is properly initialized in the __init__
method of the FlagEmbeddingReranker
class. Here are the steps to resolve the issue:
Ensure Proper Initialization:
Make sure the _model
attribute is initialized correctly in the FlagEmbeddingReranker
class. The initialization should look like this:
def __init__(
self,
top_n: int = 2,
model: str = "BAAI/bge-reranker-large",
use_fp16: bool = False,
) -> None:
try:
from FlagEmbedding import FlagReranker
except ImportError:
raise ImportError(
"Cannot import FlagReranker package, please install it: ",
"pip install git+https://github.com/FlagOpen/FlagEmbedding.git",
)
self._model = FlagReranker(
model,
use_fp16=use_fp16,
)
super().__init__(top_n=top_n, model=model, use_fp16=use_fp16)
Check the Import and Installation:
Ensure that the FlagEmbedding
package is correctly installed and the FlagReranker
class is correctly imported. You can install the package using:
pip install git+https://github.com/FlagOpen/FlagEmbedding.git
Correct Instantiation:
Verify that the FlagEmbeddingReranker
object is instantiated correctly in your script. For example:
reranker = FlagEmbeddingReranker(top_n=5, model="BAAI/bge-reranker-large")
Define _model
as a Private Attribute:
Ensure that the _model
attribute is defined as a PrivateAttr
in the FlagEmbeddingReranker
class:
from llama_index.legacy.bridge.pydantic import Field, PrivateAttr
class FlagEmbeddingReranker(BaseNodePostprocessor):
"""Flag Embedding Reranker."""
model: str = Field(description="BAAI Reranker model name.")
top_n: int = Field(description="Number of nodes to return sorted by score.")
use_fp16: bool = Field(description="Whether to use fp16 for inference.")
_model: Any = PrivateAttr()
By following these steps, you should be able to resolve the AttributeError
and ensure that the _model
attribute is properly initialized and accessible within the FlagEmbeddingReranker
class [1][2].
To continue talking to Dosu, mention @dosu.
@Zaaccckkkk you can't pickle an embedding model 😅
Right, after deleting the pickle thing, it works for once. However, it fails for all further run....
Error message:
"
Error while parsing the file '/Users/wuzhentian/Desktop/example_table.pdf':
Traceback (most recent call last):
File "/Users/wuzhentian/Desktop/LlamaParse_play/llamaparse.py", line 56, in
IndexError: list index out of range
"
It works again after changing an Internet. This error can be caused by poor Internet connection, right?
I think so, seems like there were zero page nodes
Bug Description
2 Traceback (most recent call last): File "/Users/wuzhentian/Desktop/LlamaParse_play/llamaparse.py", line 104, in
response_1 = raw_query_engine.query(query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
result = func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py", line 52, in query
query_result = self._query(str_or_query_bundle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
result = func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 189, in _query
nodes = self.retrieve(query_bundle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 145, in retrieve
return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 138, in _apply_node_postprocessors
nodes = node_postprocessor.postprocess_nodes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/postprocessor/types.py", line 56, in postprocess_nodes
return self._postprocess_nodes(nodes, query_bundle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
result = func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/wuzhentian/Desktop/LlamaParse_play/Llama_venv/lib/python3.11/site-packages/llama_index/postprocessor/flag_embedding_reranker/base.py", line 84, in _postprocess_nodes
scores = self._model.compute_score(query_and_nodes)
^^^^^^^^^^^
AttributeError: 'FlagEmbeddingReranker' object has no attribute '_model'. Did you mean: 'model'?
"""
Version
0.10.40
Steps to Reproduce
requirements.txt: llama-index-llms-huggingface llama-index-embeddings-huggingface transformers accelerate bitsandbytes llama-index llama-index-core==0.10.50.post1 llama-index-postprocessor-flag-embedding-reranker git+https://github.com/FlagOpen/FlagEmbedding.git llama-parse python-dotenv llama-index-embeddings-openai
llamaparse.py: import os from dotenv import load_dotenv, find_dotenv import nest_asyncio from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core import VectorStoreIndex, Settings from llama_parse import LlamaParse from copy import deepcopy from llama_index.core.schema import TextNode from llama_index.core.node_parser import MarkdownElementNodeParser from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker from llama_index.core import SimpleDirectoryReader import pickle
Load environment variables from .env file
load_dotenv(find_dotenv()) secret_value_0 = os.getenv('LLAMACLOUD_API_KEY') secret_value_1 = os.getenv('OPENAI_API_KEY')
Apply nest_asyncio for running async code in sync environment
nest_asyncio.apply()
Initialize embedding and language models
embed_model = OpenAIEmbedding(model="text-embedding-3-small") llm = OpenAI(model="gpt-3.5-turbo-0125") Settings.llm = llm Settings.embed_model = embed_model
Load documents using LlamaParse
documents = LlamaParse(result_type="markdown", api_key=secret_value_0).load_data("/Users/wuzhentian/Desktop/example_table.pdf")
def get_page_nodes(docs, separator="\n---\n"): """Split each document into page nodes by separator.""" nodes = [] for doc in docs: doc_chunks = doc.text.split(separator) for doc_chunk in doc_chunks: node = TextNode( text=doc_chunk, metadata=deepcopy(doc.metadata), ) nodes.append(node) return nodes
Parse documents into nodes
page_nodes = get_page_nodes(documents) node_parser = MarkdownElementNodeParser( llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8 ) nodes = node_parser.get_nodes_from_documents(documents) base_nodes, objects = node_parser.get_nodes_and_objects(nodes)
Print the content of the first node
print(page_nodes[0].get_content())
Create vector index with nodes
recursive_index = VectorStoreIndex(nodes=base_nodes + objects + page_nodes)
Cache the reranker model
reranker_filename = 'reranker_model.pkl'
def save_model(model, filename): with open(filename, 'wb') as f: pickle.dump(model, f)
def load_model(filename): with open(filename, 'rb') as f: return pickle.load(f)
Check if cached reranker model exists
if os.path.exists(reranker_filename): reranker = load_model(reranker_filename) else: reranker = FlagEmbeddingReranker( top_n=5, model="BAAI/bge-reranker-large", ) save_model(reranker, reranker_filename)
recursive_query_engine = recursive_index.as_query_engine( similarity_top_k=5, node_postprocessors=[reranker], verbose=True )
print(len(nodes))
Setup the raw query engine
file_path = "/Users/wuzhentian/Desktop/example_table.pdf" if not os.path.exists(file_path): raise ValueError(f"File {file_path} does not exist.")
reader = SimpleDirectoryReader(input_files=[file_path]) base_docs = reader.load_data() raw_index = VectorStoreIndex.from_documents(base_docs) raw_query_engine = raw_index.as_query_engine( similarity_top_k=5, node_postprocessors=[reranker] )
Now run your query
query = "How many blind participants?" response_1 = raw_query_engine.query(query) print("\nBasic Query Engine") print(response_1)
response_2 = recursive_query_engine.query(query) print("\nNew LlamaParse+ Recursive Retriever Query Engine") print(response_2)
Relevant Logs/Tracbacks
No response