neo4j / neo4j-python-driver

Neo4j Bolt driver for Python
https://neo4j.com/docs/api/python-driver/current/
Other
883 stars 187 forks source link

verify_connectivity() doesn't always catch errors #1019

Closed tomasonjo closed 4 months ago

tomasonjo commented 5 months ago

In the LangChain integration, we use the following block of code:

 try:
      self._driver.verify_connectivity()
except neo4j.exceptions.ServiceUnavailable:
    raise ValueError(
        "Could not connect to Neo4j database. "
        "Please ensure that the url is correct"
    )
except neo4j.exceptions.AuthError:
    raise ValueError(
        "Could not connect to Neo4j database. "
        "Please ensure that the username and password are correct"
    )
# Set schema
try:
    self.refresh_schema() # Does read transactions
except neo4j.exceptions.ClientError:
    raise ValueError(
        "Could not use APOC procedures. "
        "Please ensure the APOC plugin is installed in Neo4j and that "
        "'apoc.meta.data()' is allowed in Neo4j configuration "
    )

Every now and then it happens that the verify_connectivity doesn't catch any error, and the error only appears when we actually do read transactions in the refresh_schema method. This is a bit confusing for the users, but I can't give you a reproducible example when does this happen

tomasonjo commented 5 months ago

Here is one example of an error that isn't being caught: ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

robsdedude commented 5 months ago

Hi and thank you for reaching out!

As a primer: there's a discussion to be had about which exact connectivity this function should verify. Here are some options:

import asyncio

import neo4j
from neo4j.debug import watch

watch("neo4j")  # driver debug logging

URL = "neo4j://localhost:7687"
AUTH = ("neo4j", "pass")

async def main() -> None:
    async with neo4j.AsyncGraphDatabase.driver(URL, auth=AUTH) as driver:
        await driver.verify_connectivity()

if __name__ == '__main__':
    asyncio.run(main())

The fact that this is async doesn't change a thing regarding the handling of server errors.

And here's the output of that: ``` [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,715 [#0000] _: created, routing address IPv4Address(('localhost', 7687)) [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,715 [#0000] _: resolve home database [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,716 [#0000] _: attempting to update routing table from IPv4Address(('localhost', 7687)) [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,716 [#0000] _: in: localhost:7687 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720 [#0000] _: dns resolver out: 127.0.0.1:7687 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720 [#0000] _: _acquire router connection, database=None, address=ResolvedIPv4Address(('127.0.0.1', 7687)) [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720 [#0000] _: trying to hand out new connection [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720 [#0000] C: 127.0.0.1:7687 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,726 [#D8E0] C: 0x6060B017 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,726 [#D8E0] C: 0x00040405 0x00020404 0x00000104 0x00000003 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728 [#D8E0] S: 0x00000405 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728 [#D8E0] C: HELLO {'user_agent': 'neo4j-python/5.17.dev0 Python/3.11.6-final-0 (linux)', 'routing': {'address': 'localhost:7687'}, 'bolt_agent': {'product': 'neo4j-python/5.17.dev0', 'platform': 'Linux 5.15.0-94-generic; x86_64', 'language': 'Python/3.11.6-final-0', 'language_details': 'CPython; 3.11.6-final-0 (main, Oct 4 2023 09:29:57) [GCC 11.4.0]'}} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728 [#D8E0] _: client state: CONNECTED > AUTHENTICATION [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728 [#D8E0] C: LOGON {'scheme': 'basic', 'principal': 'neo4j', 'credentials': '*******'} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728 [#D8E0] _: client state: AUTHENTICATION > READY [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,731 [#D8E0] S: SUCCESS {'server': 'Neo4j/5.15.0', 'connection_id': 'bolt-34', 'hints': {'connection.recv_timeout_seconds': 120, 'telemetry.enabled': True}} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,731 [#D8E0] _: server state: CONNECTED > AUTHENTICATION [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,733 [#D8E0] S: SUCCESS {} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,733 [#D8E0] _: server state: AUTHENTICATION > READY [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,733 [#D8E0] C: ROUTE {'address': 'localhost:7687'} () {} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,735 [#D8E0] S: FAILURE {'code': 'Neo.ClientError.Database.DatabaseNotFound', 'message': "Unable to get a routing table for database 'neo4j' because this database does not exist"} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,735 [#D8E0] C: RESET [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,737 [#D8E0] S: SUCCESS {} [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,737 [#D8E0] _: server state: FAILED > READY [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,737 [#D8E0] _: released bolt-34 [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,738 [#0000] _: close [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,738 [#D8E0] C: GOODBYE [DEBUG ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,739 [#D8E0] C: Traceback (most recent call last): File "/home/rouven/.config/JetBrains/PyCharm2023.2/scratches/scratch_38.py", line 33, in asyncio.run(main()) File "/home/rouven/.pyenv/versions/3.11.6/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/home/rouven/.pyenv/versions/3.11.6/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/.pyenv/versions/3.11.6/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/rouven/.config/JetBrains/PyCharm2023.2/scratches/scratch_38.py", line 15, in main await driver.verify_connectivity() File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/driver.py", line 1008, in verify_connectivity await self._get_server_info(session_config) File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/driver.py", line 1219, in _get_server_info return await session._get_server_info() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/work/session.py", line 172, in _get_server_info await self._connect(READ_ACCESS, liveness_check_timeout=0) File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/work/session.py", line 130, in _connect await super()._connect( File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/work/workspace.py", line 161, in _connect await self._pool.update_routing_table( File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 779, in update_routing_table if await self._update_routing_table_from( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 725, in _update_routing_table_from new_routing_table = await self.fetch_routing_table( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 662, in fetch_routing_table new_routing_info = await self.fetch_routing_info( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 632, in fetch_routing_info routing_table = await cx.route( ^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt5.py", line 196, in route await self.fetch_all() File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt.py", line 863, in fetch_all detail_delta, summary_delta = await self.fetch_message() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt.py", line 849, in fetch_message res = await self._process_message(tag, fields) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt5.py", line 369, in _process_message await response.on_failure(summary_metadata or {}) File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_common.py", line 245, in on_failure raise Neo4jError.hydrate(**metadata) neo4j.exceptions.ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist} Process finished with exit code 1 ```

The important bit: I do get

neo4j.exceptions.ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

Could you please help me reproduce the issue where verify_connectivity() does not catch trying to connect to a database that doesn't exist? Or any other error you think it should catch.

tomasonjo commented 5 months ago

Here is the full traceback, not sure if it is helpful as that's the error from refresh_schema and not the connectivity method.

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:65, in Neo4jGraph.__init__(self, url, username, password, database)
     64 try:
---> 65     self.refresh_schema()
     66 except neo4j.exceptions.ClientError:

File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:88, in Neo4jGraph.refresh_schema(self)
     85 """
     86 Refreshes the Neo4j graph schema information.
     87 """
---> 88 node_properties = [el["output"] for el in self.query(node_properties_query)]
     89 rel_properties = [el["output"] for el in self.query(rel_properties_query)]

File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:79, in Neo4jGraph.query(self, query, params)
     78 try:
---> 79     data = session.run(query, params)
     80     return [r.data() for r in data]

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/session.py:302, in Session.run(self, query, parameters, **kwargs)
    301 if not self._connection:
--> 302     self._connect(self._config.default_access_mode)
    303     assert self._connection is not None

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/session.py:130, in Session._connect(self, access_mode, **acquire_kwargs)
    129 try:
--> 130     super()._connect(
    131         access_mode, auth=self._config.auth, **acquire_kwargs
    132     )
    133 except asyncio.CancelledError:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/workspace.py:178, in Workspace._connect(self, access_mode, auth, **acquire_kwargs)
    177 acquire_kwargs_.update(acquire_kwargs)
--> 178 self._connection = self._pool.acquire(**acquire_kwargs_)
    179 self._connection_access_mode = access_mode

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:912, in Neo4jPool.acquire(self, access_mode, timeout, database, bookmarks, auth, liveness_check_timeout)
    910 log.debug("[#0000]  _: <POOL> acquire routing connection, "
    911           "access_mode=%r, database=%r", access_mode, database)
--> 912 self.ensure_routing_table_is_fresh(
    913     access_mode=access_mode, database=database,
    914     imp_user=None, bookmarks=bookmarks, auth=auth,
    915     acquisition_timeout=timeout
    916 )
    918 while True:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:854, in Neo4jPool.ensure_routing_table_is_fresh(self, access_mode, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback)
    852     return False
--> 854 self.update_routing_table(
    855     database=database, imp_user=imp_user, bookmarks=bookmarks,
    856     auth=auth, acquisition_timeout=acquisition_timeout,
    857     database_callback=database_callback
    858 )
    859 self.update_connection_pool(database=database)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:776, in Neo4jPool.update_routing_table(self, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback)
    774 if prefer_initial_routing_address:
    775     # TODO: Test this state
--> 776     if self._update_routing_table_from(
    777         self.address, database=database,
    778         imp_user=imp_user, bookmarks=bookmarks, auth=auth,
    779         acquisition_timeout=acquisition_timeout,
    780         database_callback=database_callback
    781     ):
    782         # Why is only the first initial routing address used?
    783         return

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:722, in Neo4jPool._update_routing_table_from(self, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback, *routers)
    719 for address in NetworkUtil.resolve_address(
    720     router, resolver=self.pool_config.resolver
    721 ):
--> 722     new_routing_table = self.fetch_routing_table(
    723         address=address, acquisition_timeout=acquisition_timeout,
    724         database=database, imp_user=imp_user, bookmarks=bookmarks,
    725         auth=auth
    726     )
    727     if new_routing_table is not None:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:659, in Neo4jPool.fetch_routing_table(self, address, acquisition_timeout, database, imp_user, bookmarks, auth)
    658 try:
--> 659     new_routing_info = self.fetch_routing_info(
    660         address, database, imp_user, bookmarks, auth,
    661         acquisition_timeout
    662     )
    663 except Neo4jError as e:
    664     # checks if the code is an error that is caused by the client. In
    665     # this case there is no sense in trying to fetch a RT from another
    666     # router. Hence, the driver should fail fast during discovery.

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:629, in Neo4jPool.fetch_routing_info(self, address, database, imp_user, bookmarks, auth, acquisition_timeout)
    628 try:
--> 629     routing_table = cx.route(
    630         database=database or self.workspace_config.database,
    631         imp_user=imp_user or self.workspace_config.impersonated_user,
    632         bookmarks=bookmarks
    633     )
    634 finally:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt4.py:534, in Bolt4x4.route(self, database, imp_user, bookmarks, dehydration_hooks, hydration_hooks)
    533 self.send_all()
--> 534 self.fetch_all()
    535 return [metadata.get("rt")]

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py:863, in Bolt.fetch_all(self)
    862 while not response.complete:
--> 863     detail_delta, summary_delta = self.fetch_message()
    864     detail_count += detail_delta

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py:849, in Bolt.fetch_message(self)
    846 tag, fields = self.inbox.pop(
    847     hydration_hooks=self.responses[0].hydration_hooks
    848 )
--> 849 res = self._process_message(tag, fields)
    850 self.idle_since = monotonic()

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt4.py:368, in Bolt4x0._process_message(self, tag, fields)
    367 try:
--> 368     response.on_failure(summary_metadata or {})
    369 except (ServiceUnavailable, DatabaseUnavailable):

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_common.py:245, in Response.on_failure(self, metadata)
    244 Util.callback(handler)
--> 245 raise Neo4jError.hydrate(**metadata)

ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[71], line 3
      1 from langchain.graphs import Neo4jGraph
----> 3 graph = Neo4jGraph(
      4     url=NEO4J_URI, 
      5     username=NEO4J_USERNAME, 
      6     password=NEO4J_PASSWORD
      7 )

File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:67, in Neo4jGraph.__init__(self, url, username, password, database)
     65     self.refresh_schema()
     66 except neo4j.exceptions.ClientError:
---> 67     raise ValueError(
     68         "Could not use APOC procedures. "
     69         "Please ensure the APOC plugin is installed in Neo4j and that "
     70         "'apoc.meta.data()' is allowed in Neo4j configuration "
     71     )

ValueError: Could not use APOC procedures. Please ensure the APOC plugin is installed in Neo4j and that 'apoc.meta.data()' is allowed in Neo4j configuration
robsdedude commented 5 months ago

This is quite interesting. It also seems you can reproduce the error. Could you please enable debug logging in the driver and post the full logs (including the passing verify_connectivity() call)? See the docs on how to enable it.

robsdedude commented 5 months ago

@tomasonjo gentle reminder, can you please provide the driver debug logs to enable me to further investigate the issue? Or better yet, help we reproduce the issue. What exactly did you do to trigger it?

tomasonjo commented 5 months ago

Hey @robsdedude . I couldn't reproduce it locally, I just got a full debug from the user who reported the issue. Could it be something with firewall permissions, since they were at a customer with strict security?

robsdedude commented 4 months ago

The only think I could imagine is something along the lines of the user running multiple DBMSs and having a single DNS entry point at both (or a disjoint/misconfigured cluster or something like it) so that it's up to chance whether the driver succeeds or fails. I definitely need more information to be able to make a call if this is really a driver bug or if this is expected behavior and the problem lies somewhere else.

robsdedude commented 4 months ago

As written, I'm afraid I won't be able to dig further into the issue without extra info. Therefore, I'll close the issue. We can always reopen later if new information is available.