mxpoliakov / Multi-Meta-RAG

A repository for Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
https://arxiv.org/abs/2406.13213
MIT License
20 stars 1 forks source link

Neo4j compatibility issue with filters during execution of retrieve_neo4j_index.py #1

Closed YYForReal closed 1 month ago

YYForReal commented 1 month ago

I encountered an issue while running python retrieve_neo4j_index.py during the processing of the 191st data entry. The error message indicated that Neo4j does not support the filter command used. Could you please provide the specific version of Neo4j that this project supports?

Here is the filter data that caused the error:

{
    "filter": {
        "published_at": {
            "$nin": [
                "November 18, 2023"
            ]
        },
        "source": {
            "$in": [
                "TechCrunch",
                "Fortune"
            ]
        }
    },
    "query": "Was there disagreement between the two news sources on the portrayal of Sam Altman's standing in Silicon Valley after the TechCrunch report on Sam Altman's situation at OpenAI published on a date other than November 18, 2023, and the subsequent Fortune report on the same day regarding the board's actions?"
}

I noticed there are 8 other cases where similar filters are being used. Is this filter command $nin allowed in the project but incompatible with my version of Neo4j?

errors:

  7%|███▉                                                | 191/2556 [01:27<18:05,  2.18it/s]
Traceback (most recent call last):
  File "/home/szu/code/kg_project/Multi-Meta-RAG/retrieve_neo4j_index.py", line 88, in <module>
    docs = similarity_search_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/retry/api.py", line 73, in retry_decorator
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
           ^^^
  File "/home/szu/code/kg_project/Multi-Meta-RAG/retrieve_neo4j_index.py", line 32, in similarity_search_with_retry
    raise e
  File "/home/szu/code/kg_project/Multi-Meta-RAG/retrieve_neo4j_index.py", line 27, in similarity_search_with_retry
    results = vector_index.similarity_search(query=query, k=20, filter=filter)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/langchain_community/vectorstores/neo4j_vector.py", line 953, in similarity_search
    return self.similarity_search_by_vector(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/langchain_community/vectorstores/neo4j_vector.py", line 1175, in similarity_search_by_vector
    docs_and_scores = self.similarity_search_with_score_by_vector(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/langchain_community/vectorstores/neo4j_vector.py", line 1108, in similarity_search_with_score_by_vector
    results = self.query(read_query, params=parameters)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/langchain_community/vectorstores/neo4j_vector.py", line 613, in query
    data, _, _ = self._driver.execute_query(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/driver.py", line 971, in execute_query
    return session._run_transaction(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/work/session.py", line 574, in _run_transaction
    result = transaction_function(tx, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/driver.py", line 1307, in _work
    res = tx.run(query, parameters)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/work/transaction.py", line 195, in run
    result._tx_ready_run(query, parameters)
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/work/result.py", line 175, in _tx_ready_run
    self._run(query, parameters, None, None, None, None, None, None)
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/work/result.py", line 231, in _run
    self._attach()
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/work/result.py", line 425, in _attach
    self._connection.fetch_message()
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/io/_common.py", line 181, in inner
    func(*args, **kwargs)
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/io/_bolt.py", line 977, in fetch_message
    res = self._process_message(tag, fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/io/_bolt5.py", line 466, in _process_message
    response.on_failure(summary_metadata or {})
  File "/home/szu/miniconda3/envs/tog/lib/python3.11/site-packages/neo4j/_sync/io/_common.py", line 251, in on_failure
    raise Neo4jError.hydrate(**metadata)
neo4j.exceptions.CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input 'NOT': expected an expression, 'FOREACH', 'ORDER BY', 'CALL', 'CREATE', 'LOAD CSV', 'DELETE', 'DETACH', 'FINISH', 'INSERT', 'LIMIT', 'MATCH', 'MERGE', 'NODETACH', 'OFFSET', 'OPTIONAL', 'REMOVE', 'RETURN', 'SET', 'SKIP', 'UNION', 'UNWIND', 'USE', 'WITH' or <EOF> (line 1, column 160 (offset: 159))
"CYPHER runtime = parallel parallelRuntimeSupport=all MATCH (n:`Chunk`) WHERE n.`embedding` IS NOT NULL AND size(n.`embedding`) = toInteger(384) AND n.`source` NOT IN $param_1 WITH n as node, vector.similarity.cosine(n.`embedding`, $embedding) AS score ORDER BY score DESC LIMIT toInteger($k) RETURN node.`text` AS text, score, node {.*, `text`: Null, `embedding`: Null, id: Null } AS metadata"
                                                                                                                                                                ^}
YYForReal commented 1 month ago

I am using neo4j 5.24.0. How should I handle this $nin data (not) ?

image

image

mxpoliakov commented 1 month ago

Hi, @YYForReal! The issue you describe looks like a langchain bug with $nin support in Neo4j. I fixed it in my fork of langchain, but didn't have enough time to do an upstream PR. pip install -r requirements.txt should install langchain with the fix included: https://github.com/mxpoliakov/Multi-Meta-RAG/blob/main/requirements.txt#L2. Could you verify if you are using a fixed version?

YYForReal commented 1 month ago

Thank you for your help! I was previously using langchain 0.3.1 and ran into installation issues. After following your advice, I now understand the problem. Thanks again!