Open pattanmzy opened 11 months ago
@pattanmzy, thanks for your message. Could you let us know if you are using reactive API by any chance please?
Thanks @injectives for your reply.
The application does not use reactive API but it has dependency on Spring Boot Actuator for monitoring purpose. From the thread dumps, I can see that the Actuator HealthEndpoint uses the ReactiveHealthContributors to perform health check. There were many similar threads hanging waiting for response from Actuator health check. Is this the cause of the problem?
Extracted from thread dump _"http-nio-8080-exec-1" #52 daemon prio=5 os_prio=0 cpu=2079.61ms elapsed=84690.92s tid=0x00007f8bc5150fd0 nid=0x3a waiting on condition [0x00007f8bbe453000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)
Is this the cause of the problem?
If I remember correctly, the healthcheck uses the driver session run
method. If subscription is cancelled (for whatever reason), it might be a cause of such issue, but this may also be something else.
Neo4j java driver version: 5.9.0
I suggest trying the latest driver version 5.14.0 as 5.11.0 introduced the cancellation support, see the release notes.
Thanks for the feedback. I have updated the application to use the latest driver version. I will continue to monitor the application and will provide an update here if the problem still persist.
Hi @injectives ,
The connection issue happened again a few days after I upgraded the Neo4j Java driver to the latest version, i.e. 5.14.0. From the attached thread dumps that were generated at 30s to 1 min intervals, we observed that there were application initiated queries to Neo4j AuraDB queries as well as Actuator health check threads hang and did not respond.
There were errors such as connection to the database terminated, connection read timed out, server is no longer available, failed to obtain connection towards WRITE server, etc. found in the application logs. Please find the details in the application and trace logs attached.
Spring Boot starter version: 3.0.12 Spring Data Neo4j version: 7.0.11 Neo4j java driver version: 5.14.0
Thread dumps ThreadDumps.zip
Logs application-log.txt trace-log.txt
Could you please look into this? Thank you.
Hello.
Sure. The thread dumps show the http threads are in WAITING
state, while the driver threads are actually in RUNNABLE
state. It also seems like the main issue is around begin transaction area.
The application-log.txt
and trace-log.txt
show mostly connectivity issues, which could be because of some server-side maintenance or some other factor. Although, I am a little concerned about this one: org.neo4j.driver.exceptions.ConnectionReadTimeoutException: Connection read timed out due to it taking longer than the server-supplied timeout value via configuration hint.
- it basically means that the server did not respond within the timeout it suggested to the driver upon connection.
The driver debug logs might reveal more context on this issue. Would you be happy to enable and supply those please? The driver logging can be configured though the following config option.
As a side question, could you let me know if you set the target database explicitly in the driver session please? See the following config option for more details. If possible, it would be better to set this to narrow down the issue search area a little bit.
And one more general question please. Has your application been using Aura for a while before the issue started happening on a certain date or is it a new deployment? I am just trying to understand if this issue just started happening on a certain date, which could mean some server change.
Thank you.
@injectives
The driver debug logs might reveal more context on this issue. Would you be happy to enable and supply those please?
I have set the logging level to debug. I will provide the logs when the issue happens again.
As a side question, could you let me know if you set the target database explicitly in the driver session please?
The target database was not set previously. It's now set.
Has your application been using Aura for a while before the issue started happening on a certain date or is it a new deployment?
It is a new deployment of less than 2 months. This issue started happening since the beginning.
Hi @injectives ,
The issue happened again on December 26th. The thread dumps show similar issue as before. There's no error found in the application logs. However, there were warning messages on Actuator Health contributor taking long time to respond in the console log as below. There were also no application activity related log messages after this time. It appears that the issue started to happen around this time.
2023-12-26 11:20:34 WARN [http-nio-8080-exec-2] - Health contributor org.springframework.boot.actuate.autoconfigure.health.HealthEndpointConfiguration$AdaptedReactiveHealthContributors$1 (neo4j) took 30988ms to respond
2023-12-26 11:20:34 WARN [http-nio-8080-exec-3] - Health contributor org.springframework.boot.actuate.autoconfigure.health.HealthEndpointConfiguration$AdaptedReactiveHealthContributors$1 (neo4j) took 61001ms to respond
The driver debug logs are attached.
Thread dumps ThreadDumps_20231226.zip
Driver Debug Log neo4j.log.zip
Console log console.log
Appreciate if you could take a look at it. Thank you.
@pattanmzy, thanks a lot for providing the vital details. We will review soon and will get back to you.
After view the neo4j.log.zip,i have found the same issue in my application.The application is running in AWS. Log detail:{ "log": "Suppressed: org.neo4j.driver.exceptions.DiscoveryException: Failed to update routing table with server '58534a99.databases.neo4j.io(54.216.115.14):7687'.", "stream": "stdout", "time": "2024-02-21T08:51:21.223046235Z" } neo4j-driver version is :5.15.0, and it have been update to 5.18.0 which is the latest version.
Hi,
I have a Spring Boot application with Neo4j AuraDB. The application was working fine initially. However, occasionally we observed that queries to the Neo4j AuraDB hang and did not respond after a long period of time, e.g. 10 - 20 minutes. When this happened, we saw the following message in the logs.
[Neo4jDriverIO-2-1] - Closing connection pool towards p-4dc8adc8-7ed2-7.production-orch-0361.neo4j.io:7687, it has no active connections and is not in the routing table registry.
We generated a few thread dumps every 30s. We observed that those threads still hang and the number of threads waiting for response from neo4j connection kept piling up. There are no other related errors in the log.
Connection uri: neo4j+s://.databases.neo4j.io
Spring Boot starter version: 3.0.8
Spring Data Neo4j version: 7.0.7
Neo4j java driver version: 5.9.0
Neo4j version: 5
Neo4j Aura Professional on AWS
Operating System: Alpine Linux v3.18 on Kubernetes EKS
Java version: openjdk version "17.0.9"
Expected behavior
Database query should execute successfully
Actual behavior
Connection to AuraDB did not respond causing thread to hang.
Thread dumps ThreadDumps.zip
Could you please advise what could be the root cause of this issue.
Do let me know if you need more info.
Thank you.