Closed aganiezgoda closed 2 months ago
To resolve the "TypeError: documents must be a non-empty list" error when using PropertyGraphIndex.from_existing
, ensure that the nodes
parameter is provided as a non-empty list. The from_existing
method initializes the nodes
parameter as an empty list, which might be causing the error.
Here's the relevant part of the from_existing
method:
@classmethod
def from_existing(
cls: "PropertyGraphIndex",
property_graph_store: PropertyGraphStore,
vector_store: Optional[BasePydanticVectorStore] = None,
# general params
llm: Optional[BaseLLM] = None,
kg_extractors: Optional[List[TransformComponent]] = None,
# vector related params
use_async: bool = True,
embed_model: Optional[EmbedType] = None,
embed_kg_nodes: bool = True,
# parent class params
callback_manager: Optional[CallbackManager] = None,
transformations: Optional[List[TransformComponent]] = None,
storage_context: Optional[StorageContext] = None,
show_progress: bool = False,
**kwargs: Any,
) -> "PropertyGraphIndex":
"""Create an index from an existing property graph store (and optional vector store)."""
return cls(
nodes=[], # no nodes to insert
property_graph_store=property_graph_store,
vector_store=vector_store,
llm=llm,
kg_extractors=kg_extractors,
use_async=use_async,
embed_model=embed_model,
embed_kg_nodes=embed_kg_nodes,
callback_manager=callback_manager,
transformations=transformations,
storage_context=storage_context,
show_progress=show_progress,
**kwargs,
)
To fix this, you need to ensure that the nodes
parameter is a non-empty list when calling the from_existing
method. If you don't have any nodes to insert initially, you might need to handle this case separately in your implementation to avoid passing an empty list.
For example:
# Ensure nodes is a non-empty list
nodes = [your_nodes] if your_nodes else [default_node]
# Create the PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
vector_store=vector_store,
nodes=nodes,
llm=llm,
kg_extractors=[
ImplicitPathExtractor(llm=llm),
SimpleLLMPathExtractor(
llm=llm,
num_workers=4,
max_paths_per_chunk=10,
),
],
embed_kg_nodes=True,
embed_model=embeddings,
show_progress=True,
)
Additionally, ensure that your MongoDB client and vector store are correctly set up. Here is an example of how to initialize the AzureCosmosDBMongoDBVectorSearch
:
import pymongo
from llama_index.vector_stores.azurecosmosmongo import AzureCosmosDBMongoDBVectorSearch
# Set up the connection string with your Azure CosmosDB MongoDB URI
connection_string = "YOUR_AZURE_COSMOSDB_MONGODB_URI"
mongodb_client = pymongo.MongoClient(connection_string)
# Create an instance of AzureCosmosDBMongoDBVectorSearch
vector_store = AzureCosmosDBMongoDBVectorSearch(
mongodb_client=mongodb_client,
db_name="demo_vectordb",
collection_name="aaa",
)
Ensure that the nodes
you are passing to the from_existing
method are correctly formatted and contain the necessary embeddings and metadata [1][2][3].
To continue talking to Dosu, mention @dosu.
@dosu AzureCosmosDBMongoDBVectorSearch seems to be initialized correctly - documents are stored there without issues.
When I add additional node definition (nodes=nodes, with reference to MongoDB's collection), so that it looks like that:
# Create the PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
vector_store=vector_store,
nodes=nodes,
llm=llm,
kg_extractors=[
ImplicitPathExtractor(llm=llm),
SimpleLLMPathExtractor(
llm=llm,
num_workers=4,
max_paths_per_chunk=10,
),
],
embed_kg_nodes=True,
embed_model=embeddings,
show_progress=True,
)
...I get:
llama_index.core.indices.property_graph.base.PropertyGraphIndex() got multiple values for keyword argument 'nodes'
The error "PropertyGraphIndex() got multiple values for keyword argument 'nodes'" occurs because the nodes
argument is being passed both directly and through **kwargs
in the from_existing
method. The from_existing
method does not accept a nodes
parameter directly; it initializes nodes
internally as an empty list.
Here's how you should call the from_existing
method correctly:
# Create the PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
vector_store=vector_store,
llm=llm,
kg_extractors=[
ImplicitPathExtractor(llm=llm),
SimpleLLMPathExtractor(
llm=llm,
num_workers=4,
max_paths_per_chunk=10,
),
],
embed_kg_nodes=True,
embed_model=embeddings,
show_progress=True,
)
Ensure that you do not pass the nodes
parameter to the from_existing
method, as it is not expected and causes the conflict [1].
To continue talking to Dosu, mention @dosu.
Hmm, I probably need to update the _insert_nodes function to skip if there's no new nodes/documents
I'll make a release later today with this fix (quite a few other fixes in the pipeline as well). Sorry for the issues, appreciate the patience 🙏🏻
Bug Description
I'm trying to do RAG based on Property Graphs using Mongo DB (Cosmos DB version) for vector persistance:
The last part fails. Error message, shortened:
Version
0.10.40
Steps to Reproduce
see above
Relevant Logs/Tracbacks
No response