Closed younes-io closed 6 months ago
🤖
Hello again, @younes-io! It's good to see you back in our LlamaIndex repository. Let's dive into your issue.
Based on the information provided, the current version of LlamaIndex (v0.9.15.post2) does support hybrid search. This is evident from the get_data_model
function in the postgres.py
file where a check is performed on the hybrid_search
parameter. If hybrid_search
is set to True
, a HybridAbstractData
class is created with additional attributes to support hybrid search. You can find the relevant code in the postgres.py file.
As for the keyword_embedding
option, it is supported by the KnowledgeGraphRAGRetriever
module in LlamaIndex. This is evident from the retriever_mode
argument in the __init__
method of the KnowledgeGraphRAGRetriever
class. The retriever_mode
argument can take the value "keyword_embedding" among other options. You can find the relevant code in the retrievers.py file.
However, I couldn't find any evidence in the provided context to suggest that LlamaIndex supports keyword_embedding
in any of its other modules. If you could provide more specific details about the module or feature where you want to use keyword_embedding
, I might be able to provide a more accurate answer.
I hope this helps! If you have any other questions, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot I'm using NebulaGraph, and I found this btw in KnowledgeGraphRAGRetriever
:
def _retrieve_embedding(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve in embedding mode."""
if self._retriever_mode not in ["embedding", "keyword_embedding"]:
return []
# TBD: will implement this later with vector store.
raise NotImplementedError
async def _aretrieve_embedding(
self, query_bundle: QueryBundle
) -> List[NodeWithScore]:
"""Retrieve in embedding mode."""
if self._retriever_mode not in ["embedding", "keyword_embedding"]:
return []
# TBD: will implement this later with vector store.
raise NotImplementedError
I need hybridSearch for NebulaGraph. Or, another workaround, but I still need to use NebulaGraph for queries related to relationships between entities, etc
🤖
Based on the current state of the LlamaIndex codebase, the _retrieve_embedding
and _aretrieve_embedding
methods in the KnowledgeGraphRAGRetriever
class are indeed not implemented yet. As you've noticed, these methods are designed to support the "embedding" and "keyword_embedding" modes, but currently raise a NotImplementedError
.
Unfortunately, I don't have any information on when these features will be implemented as it's not specified in the repository. However, I can suggest a potential workaround for your use case.
You could consider implementing these methods yourself. The comments in the code suggest that the implementation will involve a vector store. This could mean that the methods are intended to retrieve embeddings from a store of precomputed vector representations of the nodes in the knowledge graph. These embeddings could then be used to perform similarity searches to find nodes that are relevant to a given query.
Here's a rough sketch of how you might implement the _retrieve_embedding
method:
def _retrieve_embedding(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve in embedding mode."""
if self._retriever_mode not in ["embedding", "keyword_embedding"]:
return []
# Retrieve the query from the query bundle
query = query_bundle.query
# Compute the query's embedding using your chosen method
query_embedding = compute_query_embedding(query)
# Retrieve the top k most similar nodes from the vector store
top_k_nodes = self.vector_store.most_similar(query_embedding, k=self.top_k)
# Convert the results into NodeWithScore objects
nodes_with_scores = [NodeWithScore(node=node, score=score) for node, score in top_k_nodes]
return nodes_with_scores
This is a very simplified example and the actual implementation would depend on the specifics of your use case and the capabilities of your vector store. You would also need to implement a similar method for the _aretrieve_embedding
method, which is the asynchronous version of _retrieve_embedding
.
I hope this helps! If you have any further questions or need more clarification, feel free to ask. Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot Then, provide a proposal for the NebulaGraphStore
🤖
To implement the _retrieve_embedding
and _aretrieve_embedding
methods in the KnowledgeGraphRAGRetriever
class for a NebulaGraphStore
to support "embedding" and "keyword_embedding" modes in NebulaGraph with LlamaIndex, you can follow the steps below:
_retrieve_embedding
method: This method will be used to retrieve the embeddings for a given query. You can use the get_top_k_embeddings
function from llama_index.indices.query.embedding_utils
to get the top K embeddings for the query. The method can be implemented as follows:def _retrieve_embedding(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Get nodes for response using embeddings."""
embeddings = get_top_k_embeddings(query_bundle.query_str, self.similarity_top_k)
node_ids = self._index_struct.search_node_by_embedding(embeddings)
nodes = self._docstore.get_nodes(node_ids)
nodes_with_scores = [NodeWithScore(node=node, score=DEFAULT_NODE_SCORE) for node in nodes]
return nodes_with_scores
_aretrieve_embedding
method: This method will be used to asynchronously retrieve the embeddings for a given query. You can use the get_top_k_embeddings
function from llama_index.indices.query.embedding_utils
to get the top K embeddings for the query. The method can be implemented as follows:async def _aretrieve_embedding(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Get nodes for response using embeddings asynchronously."""
embeddings = get_top_k_embeddings(query_bundle.query_str, self.similarity_top_k)
node_ids = await self._index_struct.search_node_by_embedding(embeddings)
nodes = await self._docstore.get_nodes(node_ids)
nodes_with_scores = [NodeWithScore(node=node, score=DEFAULT_NODE_SCORE) for node in nodes]
return nodes_with_scores
Please note that the get_top_k_embeddings
function and the search_node_by_embedding
method are hypothetical and need to be implemented according to your specific requirements and the structure of your data. The get_top_k_embeddings
function should return the top K embeddings for a given query, and the search_node_by_embedding
method should return the node IDs that have the closest embeddings to the given embeddings.
Regarding the QueryBundle
object, it is a data structure that encapsulates the information needed to perform a query in the LlamaIndex system. Although the exact structure of QueryBundle
is not provided in the given context, it is used in several methods and we can infer some of its properties:
query_bundle.embedding
: This property is used to store the embedding of the query. It is used in the _build_vector_store_query
method to create a VectorStoreQuery
object, and in the _text_retrieve
, _text_to_image_retrieve
, _image_to_image_retrieve
, _atext_retrieve
, _atext_to_image_retrieve
, and _aimage_to_image_retrieve
methods to store the result of the embedding retrieval.query_bundle.embedding_strs
: This property is used to store the string representation of the query's embedding. It is used in the _text_retrieve
, _text_to_image_retrieve
, _atext_retrieve
, and _atext_to_image_retrieve
methods to retrieve the embedding if it is not already present.query_bundle.embedding_image
: This property is used to store the image representation of the query's embedding. It is used in the _image_to_image_retrieve
and _aimage_to_image_retrieve
methods to retrieve the image embedding.query_bundle.query_str
: This property is used to store the string representation of the query. It is used in the _build_vector_store_query
method to create a VectorStoreQuery
object.The _retrieve_embedding
and _aretrieve_embedding
methods are not present in the given context. However, the _retrieve
and _aretrieve
methods are present, which use the QueryBundle
object to perform synchronous and asynchronous retrieval of nodes respectively. The QueryBundle
object is passed as an argument to these methods, and its properties are used to retrieve the appropriate embeddings and perform the query.
You can find more details in the llama_index/indices/knowledge_graph/retrievers.py and llama_index/indices/multi_modal/retriever.py files.
Please let me know if you need further clarification or assistance.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot okay, what about keywords ? Are they case sensitive ? if the user says "hello", is "Hello" part of the retrieved results or not ? If not, what can I do to configure NebulaGraph to be case-insensitive ?
🤖
In the LlamaIndex framework, the case sensitivity of keywords largely depends on the configuration of the underlying database, in this case, NebulaGraph. However, I couldn't find specific information about this in the LlamaIndex repository.
To better assist you, could you please provide more details about your current setup? Specifically, it would be helpful to know the version of NebulaGraph you're using and any relevant configuration settings you've applied. Additionally, if you could share a sample of the query you're using, that would be very beneficial.
Once I have this information, I'll be able to provide a more accurate and helpful response.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Question Validation
Question
As I've been looking into the docs, I realized that the KnowledgeGraphRAGRetriever doesn't support yet the keyword_embedding option, which is a blocker for my project. I'd like to know how to leverage my NebulaGraph DB and embeddings to have the best possible results in my RAG.