neo4jrb / activegraph

An active model wrapper for the Neo4j Graph Database for Ruby.
http://neo4jrb.io
MIT License
1.4k stars 277 forks source link

Neo4j::Driver::Exceptions::SessionExpiredException when all sockets are closed #1633

Open mstrofbass opened 3 years ago

mstrofbass commented 3 years ago

We recently set up a proxy (istio/envoy) between our Rails backend and the Neo4j database. After that switch, we would get a SessionExpiredException after a period of inactivity. Once we did some debugging, we discovered that the proxy has a default inactivity timeout for TCP connections of 60 minutes.

I know that activegraph uses a connection pool, which means it should automatically switch over to another connection when one is lost, but that looks like it's failing here. My best guess is that it's trying to switch to another connection but the proxy has closed all of the connections in the connection pool, and thus it's giving up.

The quick fix was to essentially ping the neo4j database with our periodic backend health check, which ensures that at least one connection stays alive. However, it seems that the driver should attempt to reconnect in this situation or otherwise handle it a bit more gracefully. Obviously there may be other considerations that I'm not privy to, but I wanted to bring it up.

Sorry I can't provide an easy way to reproduce this. Actual error we received:

I, [2020-10-27T00:20:45.672589 #1655]  INFO -- : [5f094f96-dd5b-44f1-8404-4cc8098dba0b] method=POST path=/graphql format=*/* controller=GraphqlController action=execute status=500 error='Neo4j::Driver::Exceptions::SessionExpiredException: code: `fff`, error: `4`, state: `4`, error_context: `plain_socket_send(/seabolt/src/seabolt/src/bolt/communication-plain.c:231), send error code: 32`' duration=39.56 view=0.00
F, [2020-10-27T00:20:45.674851 #1655] FATAL -- : [5f094f96-dd5b-44f1-8404-4cc8098dba0b]   
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] Neo4j::Driver::Exceptions::SessionExpiredException (code: `fff`, error: `4`, state: `4`, error_context: `plain_socket_send(/seabolt/src/seabolt/src/bolt/communication-plain.c:231), send error code: 32`):
[5f094f96-dd5b-44f1-8404-4cc8098dba0b]   
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] app/models/graph.rb:20:in `fetch_stats'
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] app/graphql/queries/graph_stats.rb:7:in `resolve'
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] app/controllers/graphql_controller.rb:15:in `execute'

Runtime information:

ruby version: 2.5.7 rails version: 6.0.2

Neo4j database version: 4.0.7 activegraph gem version: 10.0.1 neo4j-ruby-driver gem version: 1.7.2

klobuczek commented 3 years ago

@mstrofbass have you tried to set the config value:

keep_alive: true

This instructs the neo4j server to send periodic NO_OP messages (not sure how often) to the client which the driver ignores. I understand that the pool should be refreshed with new connections if none is intact but that's seabolt that we do not maintain.

mstrofbass commented 3 years ago

I believe I tried that initially to no avail. There's no way for me to retest it at this point.

jeperkins4 commented 3 years ago

@mstrofbass We encountered the same issue. Have you tried playing with the other configuration settings?

  config.neo4j.driver.keep_alive = false
  config.neo4j.driver.leaked_session_logging = true
  config.neo4j.driver.max_connection_lifetime = 1.minute
  config.neo4j.driver.max_connection_pool_size = 10
  config.neo4j.driver.connection_timeout = 30.seconds
hng commented 2 years ago

We are getting the same error if the web app that uses neo4j via activegraph wasn't accessed for some time, e.g. in the morning. When reloading everything works again. We've tried setting keep_alive to true, but that did not help.

Ruby driver and activegraph 10.0.2, neo4j-community 4.0.11