run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.45k stars 5.21k forks source link

[Bug]: AssertionError assert isinstance(node, BaseNode) or node is None #15824

Closed YashiroNanakse closed 1 month ago

YashiroNanakse commented 1 month ago

Bug Description

Traceback (most recent call last): File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\async_utils.py", line 30, in asyncio_run loop = asyncio.get_event_loop() ^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python 3.11.9\Lib\asyncio\events.py", line 681, in get_event_loop raise RuntimeError('There is no current event loop in thread %r.' RuntimeError: There is no current event loop in thread 'MainThread'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\python\graph-index\main.py", line 26, in nodes = retriever.retrieve("丽维嘉") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 264, in wrapper result = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\base\base_retriever.py", line 245, in retrieve nodes = self._retrieve(query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 264, in wrapper result = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\indices\property_graph\retriever.py", line 52, in _retrieve return asyncio_run(self._aretrieve(query_bundle)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\async_utils.py", line 38, in asyncio_run return asyncio.run(coro) ^^^^^^^^^^^^^^^^^ File "E:\Python 3.11.9\Lib\asyncio\runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "E:\Python 3.11.9\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Python 3.11.9\Lib\asyncio\base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\indices\property_graph\retriever.py", line 64, in _aretrieve async_results = await run_jobs( ^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 296, in async_wrapper result = await func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\async_utils.py", line 148, in run_jobs results = await asyncio.gather(pool_jobs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 296, in async_wrapper result = await func(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\async_utils.py", line 139, in worker return await job ^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 296, in async_wrapper result = await func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\base\base_retriever.py", line 276, in aretrieve nodes = await self._aretrieve(query_bundle=query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\indices\property_graph\sub_retrievers\base.py", line 145, in _aretrieve nodes = await self.async_add_source_text(nodes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\indices\property_graph\sub_retrievers\base.py", line 134, in async_add_source_text return self._add_source_text(nodes, og_node_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\graph-index.venv\Lib\site-packages\llama_index\core\indices\property_graph\sub_retrievers\base.py", line 89, in _add_source_text assert isinstance(node, BaseNode) or node is None ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError

Version

0.11.4

Steps to Reproduce

documents = SimpleDirectoryReader("./data").load_data()

Settings.embed_model = dashscope_embed

Settings.llm = dashscope_llm

index = PropertyGraphIndex.from_documents(documents)

print('------index-----') print(index)

retriever = index.as_retriever( include_text=True, similarity_top_k=2, )

print('------retriever-----') nodes = retriever.retrieve("丽维嘉")

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

Hey @YashiroNanakse! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve any issues you're facing.

To resolve the AssertionError "assert isinstance(node, BaseNode) or node is None" in the PropertyGraphIndex retriever, ensure that all nodes being processed are instances of BaseNode or None. This error typically occurs when a node that does not meet these criteria is passed to a function or method expecting a BaseNode.

Here is a snippet from the provided code that shows where nodes are being processed:

def _build_index_from_nodes(
    self, nodes: Optional[Sequence[BaseNode]], **build_kwargs: Any
) -> IndexLPG:
    """Build index from nodes."""
    nodes = self._insert_nodes(nodes or [])

    # this isn't really used or needed
    return IndexLPG()

In the _build_index_from_nodes method, the nodes parameter is expected to be a sequence of BaseNode instances. Ensure that when you call this method, the nodes you pass are either BaseNode instances or None.

Here is an example of how you might ensure this:

# Example of ensuring nodes are BaseNode instances or None
nodes = [node for node in nodes if isinstance(node, BaseNode) or node is None]

# Now you can safely call the method
index = property_graph_index._build_index_from_nodes(nodes)

Make sure to validate the nodes before passing them to any method that expects BaseNode instances. This will prevent the AssertionError from occurring [1].

To continue talking to Dosu, mention @dosu.

YashiroNanakse commented 1 month ago

How do you generate nodes? Why generate nodes when I'm following the documentation example and there are no nodes? @dosu

dosubot[bot] commented 1 month ago

To generate nodes when following the documentation example and there are no nodes, you need to ensure that each node has a valid doc_id. The error ValueError: Reference doc id is None. indicates that some nodes might be missing the doc_id. Here is a corrected version of the code snippet provided:

nodes = []
# transcript_array refers to an array of phrases that Whisper outputs.
for index, phrase in enumerate(transcript_array):
    # current obj index
    node = Node(text=phrase['content'] + " " + str(phrase['start']), doc_id=index)
    if index > 0 and index < len(transcript_array) - 1:
        node.relationships[DocumentRelationship.PREVIOUS] = index - 1
        node.relationships[DocumentRelationship.NEXT] = index + 1
    elif index == 0:
        node.relationships[DocumentRelationship.NEXT] = index + 1
    elif index == len(transcript_array) - 1:
        node.relationships[DocumentRelationship.PREVIOUS] = index - 1
    nodes.append(node)

index = GPTSimpleVectorIndex(nodes)

Ensure that each node is assigned a doc_id and that the doc_id is not None. This should resolve the issue and allow you to generate nodes correctly [1].

To continue talking to Dosu, mention @dosu.

YashiroNanakse commented 1 month ago

I don't think that's the reason for the error, whenever I execute the code nodes = retriever.retrieve(“Riviera”) I get the above error can you help me to analyze what's wrong?

dosubot[bot] commented 1 month ago

The error you're encountering is due to the retrieve method expecting a QueryBundle object as its argument, not a simple string. To fix this, you should create a QueryBundle object with the query string "Riviera" and pass it to the retrieve method.

Here's how you can modify your code:

from llama_index.core.indices.property_graph import QueryBundle

# Create a QueryBundle object with the query string "Riviera"
query_bundle = QueryBundle(query_str="Riviera")

# Pass the QueryBundle object to the retrieve method
nodes = retriever.retrieve(query_bundle)

This should resolve the AssertionError you are encountering [1].

To continue talking to Dosu, mention @dosu.

YashiroNanakse commented 1 month ago

from llama_index.core.indices.property_graph import QueryBundle ImportError: cannot import name 'QueryBundle' from 'llama_index.core.indices.property_graph'

dosubot[bot] commented 1 month ago

To resolve the ImportError: cannot import name 'QueryBundle' from 'llama_index.core.indices.property_graph', you should import QueryBundle from llama_index.legacy.schema instead.

Here is the correct import statement:

from llama_index.legacy.schema import QueryBundle

This change is necessary because QueryBundle is defined in llama_index.legacy.schema [1][2].

To continue talking to Dosu, mention @dosu.

YashiroNanakse commented 1 month ago

from llama_index.legacy.schema import QueryBundle ModuleNotFoundError: No module named 'llama_index.legacy'

YashiroNanakse commented 1 month ago

llama-cloud==0.0.15 llama-index==0.11.4 llama-index-agent-openai==0.3.0 llama-index-cli==0.3.0 llama-index-core==0.11.4 llama-index-embeddings-dashscope==0.2.1 llama-index-embeddings-openai==0.2.4 llama-index-graph-stores-neo4j==0.3.1 llama-index-indices-managed-llama-cloud==0.3.0 llama-index-legacy==0.9.48.post3 llama-index-llms-dashscope==0.2.0 llama-index-llms-openai==0.2.2 llama-index-multi-modal-llms-openai==0.2.0 llama-index-program-openai==0.2.0 llama-index-question-gen-openai==0.2.0 llama-index-readers-file==0.2.0 llama-index-readers-llama-parse==0.3.0 llama-parse==0.5.1

from llama_index.legacy.schema import QueryBundle ModuleNotFoundError: No module named 'llama_index.legacy'

@dosu