Open terilias opened 1 month ago
The issue arises because LlamaIndex performs a subtraction operation (1 - score
) on the scores returned by Weaviate, which now returns similarity scores instead of distances. This results in the scores being reversed.
To fix this, you can modify the code to handle similarity scores correctly. Here’s how you can adjust the scores:
Modify the score calculation: Update the part of the code where the score is calculated to use the similarity score directly without subtracting from 1.
Reverse the order of the retrieved results: Ensure that the results are sorted in descending order of relevance.
Here’s an example of how you can adjust the code:
# Modify the score calculation
def adjust_scores(scores):
return scores # Use the similarity scores directly
# Reverse the order of the retrieved results
response = retriever.retrieve("What is lion?")
response = response[::-1] # Reverse the order of the results
print(response[0])
Additionally, if you are using the VectorContextRetriever
from the LlamaIndex framework, you can modify the retrieve_from_graph
method to handle the scores correctly:
class VectorContextRetriever(BasePGRetriever):
# ... (other parts of the class)
def retrieve_from_graph(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
vector_store_query = self._get_vector_store_query(query_bundle)
triplets = []
kg_ids = []
new_scores = []
if self._graph_store.supports_vector_queries:
result = self._graph_store.vector_query(vector_store_query)
if len(result) != 2:
raise ValueError("No nodes returned by vector_query")
kg_nodes, scores = result
kg_ids = [node.id for node in kg_nodes]
triplets = self._graph_store.get_rel_map(
kg_nodes, depth=self._path_depth, ignore_rels=[KG_SOURCE_REL]
)
elif self._vector_store is not None:
query_result = self._vector_store.query(vector_store_query)
if query_result.nodes is not None and query_result.similarities is not None:
kg_ids = self._get_kg_ids(query_result.nodes)
scores = query_result.similarities
kg_nodes = self._graph_store.get(ids=kg_ids)
triplets = self._graph_store.get_rel_map(
kg_nodes, depth=self._path_depth, ignore_rels=[KG_SOURCE_REL]
)
elif query_result.ids is not None and query_result.similarities is not None:
kg_ids = query_result.ids
scores = query_result.similarities
kg_nodes = self._graph_store.get(ids=kg_ids)
triplets = self._graph_store.get_rel_map(
kg_nodes, depth=self._path_depth, ignore_rels=[KG_SOURCE_REL]
)
for triplet in triplets:
score1 = (
scores[kg_ids.index(triplet[0].id)] if triplet[0].id in kg_ids else 0.0
)
score2 = (
scores[kg_ids.index(triplet[2].id)] if triplet[2].id in kg_ids else 0.0
)
new_scores.append(max(score1, score2))
assert len(triplets) == len(new_scores)
# sort by score
top_k = sorted(zip(triplets, new_scores), key=lambda x: x[1], reverse=True)
return self._get_nodes_with_score([x[0] for x in top_k], [x[1] for x in top_k])[::-1] # Reverse the order of the results
This modification will ensure that the results are returned in the correct order of relevance directly from the retriever [1][2][3].
To continue talking to Dosu, mention @dosu.
@terilias Seems like you have a decent understanding of the issue. Feel free to open a PR 👍🏻
Hi @logan-markewich , thanks, I would like to give it a try! I have found some more issues in the Weaviate integration and I was thinking to post them for reference, each in a separate thread and try to work on them. Is that ok?
@terilias go for it!
Bug Description
Hello, I was using the retriever from a vector store index that has been initialized from a Weaviate collection. I noticed that the retrieved nodes have scores in reversed order: the first (most relevant) node, has score equals to zero and as we move to the least relevant nodes, the score increases.
We found in the code that LlamaIndex performs subtraction 1 - score, where score is the score that the Weaviate returns. But the Weaviate now, returns similarity score instead of distance. I think that only in vector (instead of hybrid) search, the distance can be returned instead of similarity (see here). You can use the code I provide below (from a Jupyter Notebook) in order to see the scores that LlamaIndex gives and the scores that Weaviate returns.
Version
llama-index==0.10.53 llama-index-vector-stores-weaviate==1.0.0 weaviate-client==4.6.5
Steps to Reproduce
Relevant Logs/Tracbacks
No response