run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.5k stars 5.22k forks source link

[Question]: how to use vector store and graph store together? #15511

Open justinzyw opened 2 months ago

justinzyw commented 2 months ago

Question Validation

Question

I have created a vector store and a graph store seperately. How can I use both of them in one query?

dosubot[bot] commented 2 months ago

To use a vector store and a graph store together in one query with LlamaIndex, you can utilize the VectorContextRetriever class, which allows querying the graph store with a vector store query. Here is an example of how you can structure such a query:

from llama_index.core.indices.property_graph.sub_retrievers.vector import VectorContextRetriever
from llama_index.core.vector_stores.types import VectorStoreQuery
from llama_index.core.schema import QueryBundle

# Initialize the graph store and vector store
graph_store = ...  # Your PropertyGraphStore instance
vector_store = ...  # Your VectorStore instance

# Create a VectorContextRetriever instance
retriever = VectorContextRetriever(
    graph_store=graph_store,
    vector_store=vector_store,
    similarity_top_k=4,
    path_depth=1,
    similarity_score=0.5
)

# Create a query bundle
query_bundle = QueryBundle(
    embedding_strs=["your query text here"]
)

# Retrieve nodes from the graph store using the vector store query
nodes_with_scores = retriever.retrieve_from_graph(query_bundle)

# Process the retrieved nodes as needed
for node_with_score in nodes_with_scores:
    print(node_with_score.node, node_with_score.score)

This method constructs a query that integrates both vector and structured data conditions, allowing you to query the graph store using vector store queries. The VectorContextRetriever class handles the retrieval process, including embedding the query text, querying the vector store, and fetching related nodes and their relationships from the graph store [1].

For example, you can use the FalkorDBPropertyGraphStore class from the LlamaIndex framework to set up and query your graph store:

from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.graph_stores.falkordb import FalkorDBPropertyGraphStore

# Create a FalkorDBPropertyGraphStore instance
graph_store = FalkorDBPropertyGraphStore(
    url="falkordb://localhost:6379",
    database="falkor"
)

# Create the index
index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
)

This setup allows you to use both vector and structured queries on your graph store [2].

To continue talking to Dosu, mention @dosu.

justinzyw commented 2 months ago

@dosu, my vector store is a SimpleVectorStore, does it work for the method above?

dosubot[bot] commented 2 months ago

Yes, the method VectorContextRetriever works with a SimpleVectorStore for using a vector store and a graph store together in one query. The VectorContextRetriever class is designed to retrieve nodes based on a query using both a vector store and a graph store. It supports scenarios where the graph store does not support vector queries by allowing an optional vector store to be provided.

Here is the relevant part of the code that shows how it handles both the graph store and the vector store:

class VectorContextRetriever(BasePGRetriever):
    def __init__(
        self,
        graph_store: PropertyGraphStore,
        include_text: bool = True,
        embed_model: Optional[BaseEmbedding] = None,
        vector_store: Optional[VectorStore] = None,
        similarity_top_k: int = 4,
        path_depth: int = 1,
        similarity_score: Optional[float] = None,
        filters: Optional[MetadataFilters] = None,
        **kwargs: Any,
    ) -> None:
        self._retriever_kwargs = kwargs or {}
        self._embed_model = embed_model or Settings.embed_model
        self._similarity_top_k = similarity_top_k
        self._vector_store = vector_store
        self._path_depth = path_depth
        self._similarity_score = similarity_score
        self._filters = filters

        super().__init__(graph_store=graph_store, include_text=include_text, **kwargs)

    def retrieve_from_graph(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        vector_store_query = self._get_vector_store_query(query_bundle)

        triplets = []
        kg_ids = []
        new_scores = []
        if self._graph_store.supports_vector_queries:
            result = self._graph_store.vector_query(vector_store_query)
            if len(result) != 2:
                raise ValueError("No nodes returned by vector_query")
            kg_nodes, scores = result

            kg_ids = [node.id for node in kg_nodes]
            triplets = self._graph_store.get_rel_map(
                kg_nodes, depth=self._path_depth, ignore_rels=[KG_SOURCE_REL]
            )
        elif self._vector_store is not None:
            query_result = self._vector_store.query(vector_store_query)
            if query_result.nodes is not None and query_result.similarities is not None:
                kg_ids = self._get_kg_ids(query_result.nodes)
                scores = query_result.similarities
                kg_nodes = self._graph_store.get(ids=kg_ids)
                triplets = self._graph_store.get_rel_map(
                    kg_nodes, depth=self._path_depth, ignore_rels=[KG_SOURCE_REL]
                )

            elif query_result.ids is not None and query_result.similarities is not None:
                kg_ids = query_result.ids
                scores = query_result.similarities
                kg_nodes = self._graph_store.get(ids=kg_ids)
                triplets = self._graph_store.get_rel_map(
                    kg_nodes, depth=self._path_depth, ignore_rels=[KG_SOURCE_REL]
                )

        for triplet in triplets:
            score1 = (
                scores[kg_ids.index(triplet[0].id)] if triplet[0].id in kg_ids else 0.0
            )
            score2 = (
                scores[kg_ids.index(triplet[2].id)] if triplet[2].id in kg_ids else 0.0
            )
            new_scores.append(max(score1, score2))

        assert len(triplets) == len(new_scores)

        # filter by similarity score
        if self._similarity_score:
            filtered_data = [
                (triplet, score)
                for triplet, score in zip(triplets, new_scores)
                if score >= self._similarity_score
            ]
            # sort by score
            top_k = sorted(filtered_data, key=lambda x: x[1], reverse=True)
        else:
            # sort by score
            top_k = sorted(zip(triplets, new_scores), key=lambda x: x[1], reverse=True)

        return self._get_nodes_with_score([x[0] for x in top_k], [x[1] for x in top_k])

This code shows that the VectorContextRetriever can use a vector store (self._vector_store) if the graph store does not support vector queries, allowing it to retrieve and combine data from both stores in one query [1].

To continue talking to Dosu, mention @dosu.