Open harshsavasil opened 1 week ago
Hello @harshsavasil! I'm a bot here to help you with any questions or issues you have. Let's see how we can solve this together.
To persist node mappings and retrieve them back from the index to use in RecursiveRetriever
, you can use the persist
and from_persist_dir
methods provided by the SimpleObjectNodeMapping
class.
To persist the node mappings, call the persist
method on your SimpleObjectNodeMapping
instance:
node_mapping.persist(persist_dir="your_persist_directory")
To retrieve the node mappings back from the index, use the from_persist_dir
class method:
retrieved_node_mapping = SimpleObjectNodeMapping.from_persist_dir(persist_dir="your_persist_directory")
RecursiveRetriever
Once you have retrieved the SimpleObjectNodeMapping
, you can use it to create an ObjectRetriever
which can then be used in RecursiveRetriever
:
object_retriever = retrieved_node_mapping.as_retriever()
# Persisting the node mappings
node_mapping.persist(persist_dir="your_persist_directory")
# Retrieving the node mappings
retrieved_node_mapping = SimpleObjectNodeMapping.from_persist_dir(persist_dir="your_persist_directory")
# Using in RecursiveRetriever
object_retriever = retrieved_node_mapping.as_retriever()
This will allow you to persist and retrieve node mappings and use them in RecursiveRetriever
. The SimpleObjectNodeMapping
class handles the persistence of both the index and the object node mappings, ensuring that you can reconstruct the index with the same mappings when needed [1][2][3][4].
To continue talking to Dosu, mention @dosu.
@dosu can't we store these node mappings in a database for better storage and efficient retrieval?
Yes, you can store node mappings in a database for better storage and efficient retrieval. You can achieve this by serializing the nodes and their relationships into a format suitable for database storage, such as JSON, and then using a database client to insert these records into your database.
Here is an example of how you might serialize the nodes and store them in a SQLite database:
import sqlite3
import json
from typing import List
from llama_index.core.schema import BaseNode, TextNode
from llama_index.core.node_parser.relational.unstructured_element import UnstructuredElementNodeParser
# Initialize the parser
parser = UnstructuredElementNodeParser()
# Example function to store nodes in a SQLite database
def store_nodes_in_db(nodes: List[BaseNode], db_path: str):
# Connect to the SQLite database (or create it if it doesn't exist)
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Create a table for storing nodes if it doesn't exist
cursor.execute('''
CREATE TABLE IF NOT EXISTS nodes (
id TEXT PRIMARY KEY,
type TEXT,
content TEXT,
relationships TEXT,
metadata TEXT
)
''')
# Insert nodes into the database
for node in nodes:
node_data = {
'id': node.id,
'type': node.type,
'content': node.get_content(),
'relationships': json.dumps(node.relationships),
'metadata': json.dumps(node.metadata)
}
cursor.execute('''
INSERT OR REPLACE INTO nodes (id, type, content, relationships, metadata)
VALUES (:id, :type, :content, :relationships, :metadata)
''', node_data)
# Commit the transaction and close the connection
conn.commit()
conn.close()
# Example usage
text_node = TextNode(id="example_id", content="Example content")
nodes = parser.get_nodes_from_node(text_node)
store_nodes_in_db(nodes, 'nodes.db')
In this example:
UnstructuredElementNodeParser
.store_nodes_in_db
that takes a list of nodes and a database path.TextNode
.This approach can be adapted to other types of databases (e.g., PostgreSQL, MongoDB) by using the appropriate database client and query syntax [1].
To continue talking to Dosu, mention @dosu.
@dosu How will I retrieve these nodes from the database for RAG?
To retrieve nodes from the database for Retrieval-Augmented Generation (RAG), you can create a custom retriever that fetches the nodes from the database and converts them into NodeWithScore
objects. Here is an example of how you can achieve this using SQLite:
sqlite3
to connect to your SQLite database and fetch the required data.BaseRetriever
class to create a custom retriever that fetches data from the SQLite database.NodeWithScore
objects, which can then be used by the RecursiveRetriever
.import sqlite3
from typing import List
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode
from llama_index.core.indices.query.schema import QueryBundle
class SQLiteRetriever(BaseRetriever):
"""Retriever that fetches data from an SQLite database."""
def __init__(self, db_path: str, top_k: int = 10) -> None:
"""Initialize with database path and top_k results."""
self.db_path = db_path
self.top_k = top_k
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve nodes from the SQLite database."""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Example query to fetch data based on some criteria
cursor.execute("SELECT content, score FROM nodes WHERE content LIKE ?", ('%' + query_bundle.query_str + '%',))
results = cursor.fetchall()
conn.close()
# Convert results to NodeWithScore objects
result_nodes = [
NodeWithScore(node=TextNode(text=row[0]), score=row[1])
for row in results[:self.top_k]
]
return result_nodes
# Example usage
db_path = 'nodes.db'
retriever = SQLiteRetriever(db_path=db_path, top_k=10)
query_bundle = QueryBundle(query_str="example query")
nodes = retriever._retrieve(query_bundle)
RecursiveRetriever
To integrate this with the RecursiveRetriever
, you would create a dictionary of retrievers and pass it to the RecursiveRetriever
:
from llama_index.core.retrievers.recursive_retriever import RecursiveRetriever
retriever_dict = {
'root': retriever # Add your SQLiteRetriever instance here
}
recursive_retriever = RecursiveRetriever(root_id='root', retriever_dict=retriever_dict)
nodes = recursive_retriever.retrieve(query_bundle)
This setup allows the RecursiveRetriever
to use the SQLiteRetriever
to fetch nodes from the SQLite database and recursively explore links from nodes to other retrievers or query engines [1][2][3].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
Hey, I'm trying to create RAG applications on top of web pages. I'm using UnstructuredElementNodeParser to parse my documents. As a result of calling
base_nodes, node_mappings = node_parser.get_base_nodes_and_mappings(raw_nodes)
, I get base nodes and node mappings. I'm using vector index to store the base nodes, but what about node mappings? How can I store my node mappings and retrieve them later from the index to pass them into a RecursiveRetriever for retrieval?