Open jjoaqu7 opened 2 months ago
Yes, it is possible using LlamaIndex to create an organized index for a nested dictionary where articles are keys and nested dictionaries contain parameters/techniques as keys with their details as a list of strings, to enable quick and efficient retrieval of relevant parameters/techniques based on user input.
The provided code includes functions and classes that can be adapted to achieve this. Specifically, the LocalIndexState
class and the get_retrieval_tool_for_docset
function can be used to manage and retrieve information from such an index.
Here is an example of how you might structure and use these components:
from typing import Dict, List, Optional
from dataclasses import dataclass
from llama_index.core.readers import Document
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import BaseTool, ToolMetadata, QueryEngineTool
from llama_index.packs.docugami_kg_rag.helpers.vector_store import get_vector_store
from llama_index.packs.docugami_kg_rag.helpers.fused_summary_retriever import FusedSummaryRetriever
from llama_index.core.vector_stores.types import VectorStoreQueryMode
@dataclass
class LocalIndexState:
full_doc_summaries_by_id: Dict[str, Document]
chunks_by_id: Dict[str, Document]
retrieval_tool_function_name: str
retrieval_tool_description: str
reports: List[ReportDetails]
def get_retrieval_tool_for_docset(
docset_id: str, docset_state: LocalIndexState
) -> Optional[BaseTool]:
chunk_vectorstore = get_vector_store(docset_id)
if not chunk_vectorstore:
return None
retriever = FusedSummaryRetriever(
vectorstore=chunk_vectorstore,
parent_doc_store=docset_state.chunks_by_id,
full_doc_summary_store=docset_state.full_doc_summaries_by_id,
search_type=VectorStoreQueryMode.MMR,
)
if not retriever:
return None
query_engine = RetrieverQueryEngine(retriever=retriever)
return QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name=docset_state.retrieval_tool_function_name,
description=docset_state.retrieval_tool_description,
),
)
# Example usage
nested_dict = {
"Article1": {
"Parameter1": ["Detail1", "Detail2"],
"Parameter2": ["Detail3", "Detail4"]
},
"Article2": {
"Technique1": ["Detail5", "Detail6"],
"Technique2": ["Detail7", "Detail8"]
}
}
# Convert nested dictionary to Documents and create LocalIndexState
documents = {key: Document(text=str(value)) for key, value in nested_dict.items()}
index_state = LocalIndexState(
full_doc_summaries_by_id=documents,
chunks_by_id=documents,
retrieval_tool_function_name="search_articles",
retrieval_tool_description="Searches for and returns relevant parameters/techniques from articles.",
reports=[]
)
# Retrieve tool for a specific docset
retrieval_tool = get_retrieval_tool_for_docset("docset_id", index_state)
This code sets up a LocalIndexState
with documents created from the nested dictionary and retrieves a tool for querying the index [1].
To continue talking to Dosu, mention @dosu.
Do you recommend me using a vector store here or knowledge graph store?
@dosu
I recommend using a vector store for indexing a nested dictionary with articles and their parameters/techniques. Vector stores are efficient for storing and querying embeddings, which is suitable for handling nested dictionaries. Here is an example of how to use a vector store with LlamaIndex and Chroma:
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
# Load some documents
documents = SimpleDirectoryReader("./data").load_data()
# Initialize client, setting path to save data
db = chromadb.PersistentClient(path="./chroma_db")
# Create collection
chroma_collection = db.get_or_create_collection("quickstart")
# Assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create your index
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
# Create a query engine and query
query_engine = index.as_query_engine()
response = query_engine.query("What is the meaning of life?")
print(response)
This approach allows you to efficiently store and query embeddings, making it suitable for your use case [1][2].
However, if you need to represent and query complex relationships between entities, a knowledge graph store might be more suitable. You can explore more about Knowledge Graph Indexing in the LlamaIndex documentation under the Knowledge Graph Index section [3][4].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
Hi, is it possible using llama index to do the following:
@dosu