Closed justinzyw closed 2 hours ago
To upsert nodes in a Neo4j property graph and refresh the node's name, properties, and related embeddings, you can use the upsert_nodes
method. This method merges nodes based on their IDs and updates their properties and embeddings. Here's a brief overview of how it works:
Separate Nodes by Type: Nodes are categorized into entity_dicts
and chunk_dicts
based on their type (EntityNode
or ChunkNode
).
Upsert Chunk Nodes: For each chunk in chunk_dicts
, a Cypher query is executed to:
Upsert Entity Nodes: For each chunk in entity_dicts
, a Cypher query is executed to:
triplet_source_id
is provided.This approach ensures that the node's name, properties, and embeddings are refreshed in the Neo4j graph. Here's a code snippet illustrating the method:
def upsert_nodes(self, nodes: List[LabelledNode]) -> None:
# Lists to hold separated types
entity_dicts: List[dict] = []
chunk_dicts: List[dict] = []
# Sort by type
for item in nodes:
if isinstance(item, EntityNode):
entity_dicts.append({**item.dict(), "id": item.id})
elif isinstance(item, ChunkNode):
chunk_dicts.append({**item.dict(), "id": item.id})
else:
pass
if chunk_dicts:
for index in range(0, len(chunk_dicts), CHUNK_SIZE):
chunked_params = chunk_dicts[index : index + CHUNK_SIZE]
for param in chunked_params:
formatted_properties = ", ".join(
[
f"{key}: {value!r}"
for key, value in param["properties"].items()
]
)
self.structured_query(
f"""
MERGE (c:{BASE_NODE_LABEL} {{id: '{param["id"]}'}})
SET c.`text` = '{param["text"]}', c:Chunk
WITH c
SET c += {{{formatted_properties}}}
RETURN count(*)
"""
)
if entity_dicts:
for index in range(0, len(entity_dicts), CHUNK_SIZE):
chunked_params = entity_dicts[index : index + CHUNK_SIZE]
for param in chunked_params:
formatted_properties = ", ".join(
[
f"{key}: {value!r}"
for key, value in param["properties"].items()
]
)
self.structured_query(
f"""
MERGE (e:{BASE_NODE_LABEL} {{id: '{param["id"]}'}})
SET e += {{{formatted_properties}}}
SET e.name = '{param["name"]}', e:`{BASE_ENTITY_LABEL}`
WITH e
SET e :{param["label"]}
"""
)
triplet_source_id = param["properties"].get("triplet_source_id")
if triplet_source_id:
self.structured_query(
f"""
MERGE (e:{BASE_NODE_LABEL} {{id: '{param["id"]}'}})
MERGE (c:{BASE_NODE_LABEL} {{id: '{triplet_source_id}'}})
MERGE (e)<-[:MENTIONS]-(c)
"""
)
This method ensures that the nodes are updated or created as needed, maintaining the integrity of the graph [1].
To continue talking to Dosu, mention @dosu.
@dosu, the method above directly manipulates the db but it can not refresh the embeddings according to the new name and properties. it has to invoke lllm in order to generate embeddings accordingly.
To refresh embeddings based on updated node names and properties, you need to invoke the LLM to generate new embeddings. You can use the embed_nodes
function for text nodes or the embed_image_nodes
function for image nodes. These functions check if a node's embedding is None
and generate new embeddings using the provided embedding model. Here's how you can do it:
For text nodes:
from llama_index.legacy.embeddings.base import BaseEmbedding
from llama_index.legacy.schema import BaseNode
# Assume nodes is a sequence of BaseNode and embed_model is an instance of BaseEmbedding
nodes = [...] # Your list of nodes
embed_model = BaseEmbedding(...) # Your embedding model
# Generate new embeddings
id_to_embed_map = embed_nodes(nodes, embed_model, show_progress=True)
For image nodes:
from llama_index.legacy.embeddings.multi_modal_base import MultiModalEmbedding
from llama_index.legacy.schema import ImageNode
# Assume nodes is a sequence of ImageNode and embed_model is an instance of MultiModalEmbedding
nodes = [...] # Your list of image nodes
embed_model = MultiModalEmbedding(...) # Your embedding model
# Generate new image embeddings
id_to_embed_map = embed_image_nodes(nodes, embed_model, show_progress=True)
These functions return a dictionary mapping node IDs to their corresponding embeddings. You can then update the Neo4j graph with these new embeddings [1][2].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I have a neo4j property graph. And I would like to refresh the node's name, properties and related embeddings. How can I achieve it by upserting the nodes?