Closed tomasonjo closed 4 months ago
Here is one example of an error that isn't being caught:
ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}
Hi and thank you for reaching out!
As a primer: there's a discussion to be had about which exact connectivity this function should verify. Here are some options:
To a database hosted on the DBMS. In this case: which one? The user's home databse? The always existing system
database? Another one?
Currently, it does the latter option: it tries to connect to the user's home database.
Therefore, I'm actually surprised that Neo.ClientError.Database.DatabaseNotFound
does not get raised. I couldn't reproduce it either. Using
5.x
(think dev) branch, pretty much identical to v5.17.I spin up a fresh DBMS, run DROP DATABASE neo4j
, and then
import asyncio
import neo4j
from neo4j.debug import watch
watch("neo4j") # driver debug logging
URL = "neo4j://localhost:7687"
AUTH = ("neo4j", "pass")
async def main() -> None:
async with neo4j.AsyncGraphDatabase.driver(URL, auth=AUTH) as driver:
await driver.verify_connectivity()
if __name__ == '__main__':
asyncio.run(main())
The fact that this is async doesn't change a thing regarding the handling of server errors.
The important bit: I do get
neo4j.exceptions.ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}
Could you please help me reproduce the issue where verify_connectivity()
does not catch trying to connect to a database that doesn't exist? Or any other error you think it should catch.
Here is the full traceback, not sure if it is helpful as that's the error from refresh_schema and not the connectivity method.
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:65, in Neo4jGraph.__init__(self, url, username, password, database)
64 try:
---> 65 self.refresh_schema()
66 except neo4j.exceptions.ClientError:
File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:88, in Neo4jGraph.refresh_schema(self)
85 """
86 Refreshes the Neo4j graph schema information.
87 """
---> 88 node_properties = [el["output"] for el in self.query(node_properties_query)]
89 rel_properties = [el["output"] for el in self.query(rel_properties_query)]
File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:79, in Neo4jGraph.query(self, query, params)
78 try:
---> 79 data = session.run(query, params)
80 return [r.data() for r in data]
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/session.py:302, in Session.run(self, query, parameters, **kwargs)
301 if not self._connection:
--> 302 self._connect(self._config.default_access_mode)
303 assert self._connection is not None
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/session.py:130, in Session._connect(self, access_mode, **acquire_kwargs)
129 try:
--> 130 super()._connect(
131 access_mode, auth=self._config.auth, **acquire_kwargs
132 )
133 except asyncio.CancelledError:
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/workspace.py:178, in Workspace._connect(self, access_mode, auth, **acquire_kwargs)
177 acquire_kwargs_.update(acquire_kwargs)
--> 178 self._connection = self._pool.acquire(**acquire_kwargs_)
179 self._connection_access_mode = access_mode
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:912, in Neo4jPool.acquire(self, access_mode, timeout, database, bookmarks, auth, liveness_check_timeout)
910 log.debug("[#0000] _: <POOL> acquire routing connection, "
911 "access_mode=%r, database=%r", access_mode, database)
--> 912 self.ensure_routing_table_is_fresh(
913 access_mode=access_mode, database=database,
914 imp_user=None, bookmarks=bookmarks, auth=auth,
915 acquisition_timeout=timeout
916 )
918 while True:
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:854, in Neo4jPool.ensure_routing_table_is_fresh(self, access_mode, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback)
852 return False
--> 854 self.update_routing_table(
855 database=database, imp_user=imp_user, bookmarks=bookmarks,
856 auth=auth, acquisition_timeout=acquisition_timeout,
857 database_callback=database_callback
858 )
859 self.update_connection_pool(database=database)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:776, in Neo4jPool.update_routing_table(self, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback)
774 if prefer_initial_routing_address:
775 # TODO: Test this state
--> 776 if self._update_routing_table_from(
777 self.address, database=database,
778 imp_user=imp_user, bookmarks=bookmarks, auth=auth,
779 acquisition_timeout=acquisition_timeout,
780 database_callback=database_callback
781 ):
782 # Why is only the first initial routing address used?
783 return
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:722, in Neo4jPool._update_routing_table_from(self, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback, *routers)
719 for address in NetworkUtil.resolve_address(
720 router, resolver=self.pool_config.resolver
721 ):
--> 722 new_routing_table = self.fetch_routing_table(
723 address=address, acquisition_timeout=acquisition_timeout,
724 database=database, imp_user=imp_user, bookmarks=bookmarks,
725 auth=auth
726 )
727 if new_routing_table is not None:
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:659, in Neo4jPool.fetch_routing_table(self, address, acquisition_timeout, database, imp_user, bookmarks, auth)
658 try:
--> 659 new_routing_info = self.fetch_routing_info(
660 address, database, imp_user, bookmarks, auth,
661 acquisition_timeout
662 )
663 except Neo4jError as e:
664 # checks if the code is an error that is caused by the client. In
665 # this case there is no sense in trying to fetch a RT from another
666 # router. Hence, the driver should fail fast during discovery.
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:629, in Neo4jPool.fetch_routing_info(self, address, database, imp_user, bookmarks, auth, acquisition_timeout)
628 try:
--> 629 routing_table = cx.route(
630 database=database or self.workspace_config.database,
631 imp_user=imp_user or self.workspace_config.impersonated_user,
632 bookmarks=bookmarks
633 )
634 finally:
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt4.py:534, in Bolt4x4.route(self, database, imp_user, bookmarks, dehydration_hooks, hydration_hooks)
533 self.send_all()
--> 534 self.fetch_all()
535 return [metadata.get("rt")]
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py:863, in Bolt.fetch_all(self)
862 while not response.complete:
--> 863 detail_delta, summary_delta = self.fetch_message()
864 detail_count += detail_delta
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py:849, in Bolt.fetch_message(self)
846 tag, fields = self.inbox.pop(
847 hydration_hooks=self.responses[0].hydration_hooks
848 )
--> 849 res = self._process_message(tag, fields)
850 self.idle_since = monotonic()
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt4.py:368, in Bolt4x0._process_message(self, tag, fields)
367 try:
--> 368 response.on_failure(summary_metadata or {})
369 except (ServiceUnavailable, DatabaseUnavailable):
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_common.py:245, in Response.on_failure(self, metadata)
244 Util.callback(handler)
--> 245 raise Neo4jError.hydrate(**metadata)
ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[71], line 3
1 from langchain.graphs import Neo4jGraph
----> 3 graph = Neo4jGraph(
4 url=NEO4J_URI,
5 username=NEO4J_USERNAME,
6 password=NEO4J_PASSWORD
7 )
File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:67, in Neo4jGraph.__init__(self, url, username, password, database)
65 self.refresh_schema()
66 except neo4j.exceptions.ClientError:
---> 67 raise ValueError(
68 "Could not use APOC procedures. "
69 "Please ensure the APOC plugin is installed in Neo4j and that "
70 "'apoc.meta.data()' is allowed in Neo4j configuration "
71 )
ValueError: Could not use APOC procedures. Please ensure the APOC plugin is installed in Neo4j and that 'apoc.meta.data()' is allowed in Neo4j configuration
This is quite interesting. It also seems you can reproduce the error. Could you please enable debug logging in the driver and post the full logs (including the passing verify_connectivity()
call)? See the docs on how to enable it.
@tomasonjo gentle reminder, can you please provide the driver debug logs to enable me to further investigate the issue? Or better yet, help we reproduce the issue. What exactly did you do to trigger it?
Hey @robsdedude . I couldn't reproduce it locally, I just got a full debug from the user who reported the issue. Could it be something with firewall permissions, since they were at a customer with strict security?
The only think I could imagine is something along the lines of the user running multiple DBMSs and having a single DNS entry point at both (or a disjoint/misconfigured cluster or something like it) so that it's up to chance whether the driver succeeds or fails. I definitely need more information to be able to make a call if this is really a driver bug or if this is expected behavior and the problem lies somewhere else.
As written, I'm afraid I won't be able to dig further into the issue without extra info. Therefore, I'll close the issue. We can always reopen later if new information is available.
In the LangChain integration, we use the following block of code:
Every now and then it happens that the
verify_connectivity
doesn't catch any error, and the error only appears when we actually do read transactions in therefresh_schema
method. This is a bit confusing for the users, but I can't give you a reproducible example when does this happen