Closed shalinanvai closed 3 weeks ago
Hey @shalinanvai! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve this issue!
The issue you're encountering is due to the _retrieve_embedding
method not being implemented in the KnowledgeGraphIndex
class. This method is essential for embedding-based retrieval and currently raises a NotImplementedError
, indicating that the functionality is not yet available.
Here is the relevant part of the code showing the method is not implemented:
def _retrieve_embedding(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve in embedding mode."""
if self._retriever_mode not in ["embedding", "keyword_embedding"]:
return []
# TBD: will implement this later with vector store.
raise NotImplementedError
To resolve this issue, you need to implement the _retrieve_embedding
method to handle embedding-based retrieval. Until this method is implemented, the retriever_mode="embedding"
will not function correctly when loading an already created knowledge graph [1][2].
To continue talking to Dosu, mention @dosu.
@dosu It appears that the embedding based retrieval is correctly implemented in the retriever_mode="hybrid" mode. Is that accurate?
Yes, embedding-based retrieval is correctly implemented in the retriever_mode="hybrid"
mode for the KnowledgeGraphIndex
class.
Supporting evidence from the code:
KGTableRetriever
class has a _retrieve
method that handles different retrieval modes, including HYBRID
.retriever_mode
is set to HYBRID
, the method combines both keyword-based and embedding-based retrieval.get_top_k_embeddings
function, which retrieves the top-k similar embeddings.Here is the relevant code snippet:
if (
self._retriever_mode != KGRetrieverMode.KEYWORD
and len(self._index_struct.embedding_dict) > 0
):
query_embedding = self._service_context.embed_model.get_text_embedding(
query_bundle.query_str
)
all_rel_texts = list(self._index_struct.embedding_dict.keys())
rel_text_embeddings = [
self._index_struct.embedding_dict[_id] for _id in all_rel_texts
]
similarities, top_rel_texts = get_top_k_embeddings(
query_embedding,
rel_text_embeddings,
similarity_top_k=self.similarity_top_k,
embedding_ids=all_rel_texts,
)
logger.debug(
f"Found the following rel_texts+query similarites: {similarities!s}"
)
logger.debug(f"Found the following top_k rel_texts: {rel_texts!s}")
rel_texts.extend(top_rel_texts)
elif len(self._index_struct.embedding_dict) == 0:
logger.warning(
"Index was not constructed with embeddings, skipping embedding usage..."
)
# remove any duplicates from keyword + embedding queries
if self._retriever_mode == KGRetrieverMode.HYBRID:
rel_texts = list(set(rel_texts))
# remove shorter rel_texts that are substrings of longer rel_texts
rel_texts.sort(key=len, reverse=True)
for i in range(len(rel_texts)):
for j in range(i + 1, len(rel_texts)):
if rel_texts[j] in rel_texts[i]:
rel_texts[j] = ""
rel_texts = [rel_text for rel_text in rel_texts if rel_text != ""]
# truncate rel_texts
rel_texts = rel_texts[: self.max_knowledge_sequence]
Additionally, the KGTableRetriever
class is designed to support different retrieval modes, including HYBRID
, as specified in its initialization parameters and the KGRetrieverMode
enum. This ensures that the hybrid retrieval mode is correctly implemented and functional [1][2][3].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I am trying to use retriever_mode="embedding" in the query sent to the query engine. When I create the knowledge graph for the first time, this works. But when I load the graph for the second time (when the knowledge graph has already been created), the response to the queries is empty. There are no triples retrieved.
Here is how I am creating the database and then querying it:
I am inserting the knowledge graph triples individually using the following code:
index.upsert_triplet_and_node(tuple1, node1, include_embeddings=True)
I tried Kuzu, neo4j and nebula and all of them have this problem. Do I have to pass in a vector store index somewhere? Or any other parameters which are missing?