Closed letoribo closed 1 month ago
Hi @letoribo ,
Regarding filters, "AND" is the default behavior. You can explicitly use an "OR" condition with:
"filters": {
"$or": [
{"subject": {"$ilike": query_text}},
{"snippet": {"$ilike": query_text}},
]
}
See some examples in the documentation.
For you other question, the two approaches you compare do not do the same things:
Neo4jVector.from_existing_graph
assumes vectors are already in the graph and only check for the existence of the indexes (and creates them if they do not exist)upsert_vector
pushes new vectors to the DB and add them to an existing index, performing one Cypher query for each (potentially large) vector, which explains why it requires more time.Hope that helps!
using
"filters": {
"$or": [
{"subject": {"$ilike": query_text}},
{"snippet": {"$ilike": query_text}},
]
}
the search_query produced by get_search_results
is:
MATCH (node:
MailItem) WHERE node.
embeddingIS NOT NULL AND size(node.
embedding) = toInteger($embedding_dimension) AND ((toLower(node.subject) CONTAINS $param_0) AND (toLower(node.snippet) CONTAINS $param_1)) WITH node, vector.similarity.cosine(node.
embedding, $query_vector) AS score ORDER BY score DESC LIMIT $top_k RETURN node {.snippet, .subject} as node, score
in my example it is:
MATCH (node:
MailItem) WHERE node.
embeddingIS NOT NULL AND size(node.
embedding) = toInteger($embedding_dimension) AND ((toLower(node.subject) CONTAINS $param_0) OR (toLower(node.snippet) CONTAINS $param_1)) WITH node, vector.similarity.cosine(node.
embedding, $query_vector) AS score ORDER BY score DESC LIMIT $top_k RETURN node {.snippet, .subject} as node, score
Neo4jVector.from_existing_graph
creates both index and vectors for the properties of interest and takes short time
Can you share the package version and the call to get_search_query
please?
Because, on my dev branch if I use like this:
get_search_query(SearchType.VECTOR, node_label="Label", embedding_node_property="embedding", embedding_dimension=10, filters={
"$or": [
{"subject": {"$ilike": "<query_text>"}},
{"snippet": {"$ilike": "<query_text>"}},
]
})
I get:
'MATCH (node:`Label`) WHERE node.`embedding` IS NOT NULL AND size(node.`embedding`) = toInteger($embedding_dimension) AND ((toLower(node.subject) CONTAINS $param_0) OR (toLower(node.snippet) CONTAINS $param_1)) WITH node, vector.similarity.cosine(node.`embedding`, $query_vector) AS score ORDER BY score DESC LIMIT $top_k RETURN node { .*, `embedding`: null } AS node, labels(node) AS nodeLabels, elementId(node) AS id, score',
which seems correct to me, we have the "OR" operator between the two "CONTAINS", or am I missing something?
I mean what is your call when you define the filters.
Try this:
retriever_result = vector_retriever.search(
query_text=query_text,
#query_vector=query_vector[0],
top_k=20,
filters={"$or": [{"subject": {"$ilike": query_text}}, {"snippet": {"$ilike": query_text}}]}
)
Now it works perhaps the restart was unsuccessful @stellasia Thank you
Closing this issue then as it seems to be solved. Feel free to reopen if this is not the case.
when doing like this:
it takes 5-6 seconds on 50 nodes, the same result with upsert_vector ~ 55-60 seconds.
did this:
also
regarding filters - need to return nodes where:the query text is either in the snippet or in the subject
in https://github.com/neo4j/neo4j-graphrag-python/blob/main/src/neo4j_graphrag/filters.py#L322 if replace:
{OPERATOR_AND: [{k: v} for k, v in filter.items()]}, param_store, node_alias
with {OPERATOR_OR: [{k: v} for k, v in filter.items()]}, param_store, node_alias it works as expected
probably there is a sense to add a parameter that will determine how the filter will work