run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.62k stars 4.72k forks source link

[Question]: How can I persist node mappings and retrieve them back for RecursiveRetrieval? #14583

Open harshsavasil opened 1 week ago

harshsavasil commented 1 week ago

Question Validation

Question

Hey, I'm trying to create RAG applications on top of web pages. I'm using UnstructuredElementNodeParser to parse my documents. As a result of calling base_nodes, node_mappings = node_parser.get_base_nodes_and_mappings(raw_nodes), I get base nodes and node mappings. I'm using vector index to store the base nodes, but what about node mappings? How can I store my node mappings and retrieve them later from the index to pass them into a RecursiveRetriever for retrieval?

dosubot[bot] commented 1 week ago

Hello @harshsavasil! I'm a bot here to help you with any questions or issues you have. Let's see how we can solve this together.

To persist node mappings and retrieve them back from the index to use in RecursiveRetriever, you can use the persist and from_persist_dir methods provided by the SimpleObjectNodeMapping class.

Persisting Node Mappings

To persist the node mappings, call the persist method on your SimpleObjectNodeMapping instance:

node_mapping.persist(persist_dir="your_persist_directory")

Retrieving Node Mappings

To retrieve the node mappings back from the index, use the from_persist_dir class method:

retrieved_node_mapping = SimpleObjectNodeMapping.from_persist_dir(persist_dir="your_persist_directory")

Using in RecursiveRetriever

Once you have retrieved the SimpleObjectNodeMapping, you can use it to create an ObjectRetriever which can then be used in RecursiveRetriever:

object_retriever = retrieved_node_mapping.as_retriever()

Example Code

# Persisting the node mappings
node_mapping.persist(persist_dir="your_persist_directory")

# Retrieving the node mappings
retrieved_node_mapping = SimpleObjectNodeMapping.from_persist_dir(persist_dir="your_persist_directory")

# Using in RecursiveRetriever
object_retriever = retrieved_node_mapping.as_retriever()

This will allow you to persist and retrieve node mappings and use them in RecursiveRetriever. The SimpleObjectNodeMapping class handles the persistence of both the index and the object node mappings, ensuring that you can reconstruct the index with the same mappings when needed [1][2][3][4].

To continue talking to Dosu, mention @dosu.

harshsavasil commented 1 week ago

@dosu can't we store these node mappings in a database for better storage and efficient retrieval?

dosubot[bot] commented 1 week ago

Yes, you can store node mappings in a database for better storage and efficient retrieval. You can achieve this by serializing the nodes and their relationships into a format suitable for database storage, such as JSON, and then using a database client to insert these records into your database.

Here is an example of how you might serialize the nodes and store them in a SQLite database:

import sqlite3
import json
from typing import List
from llama_index.core.schema import BaseNode, TextNode
from llama_index.core.node_parser.relational.unstructured_element import UnstructuredElementNodeParser

# Initialize the parser
parser = UnstructuredElementNodeParser()

# Example function to store nodes in a SQLite database
def store_nodes_in_db(nodes: List[BaseNode], db_path: str):
    # Connect to the SQLite database (or create it if it doesn't exist)
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Create a table for storing nodes if it doesn't exist
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS nodes (
            id TEXT PRIMARY KEY,
            type TEXT,
            content TEXT,
            relationships TEXT,
            metadata TEXT
        )
    ''')

    # Insert nodes into the database
    for node in nodes:
        node_data = {
            'id': node.id,
            'type': node.type,
            'content': node.get_content(),
            'relationships': json.dumps(node.relationships),
            'metadata': json.dumps(node.metadata)
        }
        cursor.execute('''
            INSERT OR REPLACE INTO nodes (id, type, content, relationships, metadata)
            VALUES (:id, :type, :content, :relationships, :metadata)
        ''', node_data)

    # Commit the transaction and close the connection
    conn.commit()
    conn.close()

# Example usage
text_node = TextNode(id="example_id", content="Example content")
nodes = parser.get_nodes_from_node(text_node)
store_nodes_in_db(nodes, 'nodes.db')

In this example:

  1. We initialize the UnstructuredElementNodeParser.
  2. We define a function store_nodes_in_db that takes a list of nodes and a database path.
  3. The function connects to a SQLite database and creates a table for storing nodes if it doesn't already exist.
  4. It then serializes each node's data into JSON format and inserts it into the database.
  5. Finally, we demonstrate how to use this function with an example TextNode.

This approach can be adapted to other types of databases (e.g., PostgreSQL, MongoDB) by using the appropriate database client and query syntax [1].

To continue talking to Dosu, mention @dosu.

harshsavasil commented 1 week ago

@dosu How will I retrieve these nodes from the database for RAG?

dosubot[bot] commented 1 week ago

To retrieve nodes from the database for Retrieval-Augmented Generation (RAG), you can create a custom retriever that fetches the nodes from the database and converts them into NodeWithScore objects. Here is an example of how you can achieve this using SQLite:

Step-by-Step Guide

  1. Connect to the SQLite Database: Use a library like sqlite3 to connect to your SQLite database and fetch the required data.
  2. Create a Custom Retriever: Extend the BaseRetriever class to create a custom retriever that fetches data from the SQLite database.
  3. Convert Data to Nodes: Convert the fetched data into NodeWithScore objects, which can then be used by the RecursiveRetriever.

Example Implementation

import sqlite3
from typing import List
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode
from llama_index.core.indices.query.schema import QueryBundle

class SQLiteRetriever(BaseRetriever):
    """Retriever that fetches data from an SQLite database."""

    def __init__(self, db_path: str, top_k: int = 10) -> None:
        """Initialize with database path and top_k results."""
        self.db_path = db_path
        self.top_k = top_k

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve nodes from the SQLite database."""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        # Example query to fetch data based on some criteria
        cursor.execute("SELECT content, score FROM nodes WHERE content LIKE ?", ('%' + query_bundle.query_str + '%',))
        results = cursor.fetchall()
        conn.close()

        # Convert results to NodeWithScore objects
        result_nodes = [
            NodeWithScore(node=TextNode(text=row[0]), score=row[1])
            for row in results[:self.top_k]
        ]
        return result_nodes

# Example usage
db_path = 'nodes.db'
retriever = SQLiteRetriever(db_path=db_path, top_k=10)
query_bundle = QueryBundle(query_str="example query")
nodes = retriever._retrieve(query_bundle)

Integrating with RecursiveRetriever

To integrate this with the RecursiveRetriever, you would create a dictionary of retrievers and pass it to the RecursiveRetriever:

from llama_index.core.retrievers.recursive_retriever import RecursiveRetriever

retriever_dict = {
    'root': retriever  # Add your SQLiteRetriever instance here
}

recursive_retriever = RecursiveRetriever(root_id='root', retriever_dict=retriever_dict)
nodes = recursive_retriever.retrieve(query_bundle)

This setup allows the RecursiveRetriever to use the SQLiteRetriever to fetch nodes from the SQLite database and recursively explore links from nodes to other retrievers or query engines [1][2][3].

To continue talking to Dosu, mention @dosu.