Open danielmconrad opened 1 year ago
We are still seeing this issue with neo4j-ruby-driver 4.4.4
Is there is any news about that issue?
Still see it at 4.4.5 and sometimes there is another one error undefined method map for nil
at driver/internal/cluster/routing_table_handler_impl.rb
We're seeing this issue as well. It pops up intermittently, but would like to have it run clean. Is there any update on this?
It looks like the issue might be here: https://github.com/neo4jrb/neo4j-ruby-driver/blob/610e61126e984244e2be05836b3d2628cc57ce03/lib/neo4j/driver/internal/bolt_server_address.rb#L89
The BoltServerAddress
is putting itself into a set. When it gets called later it no longer has the attributes
method since it's not a BoltServerAddress
anymore
@L33tH4x0r , but if you follow the stack trace above, the Set
s enclosing the BoltServerAddress
get reduced into one Set
here
https://github.com/neo4jrb/neo4j-ruby-driver/blob/4.4/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb#L63
addresses_to_retain = @routing_table_registry.all_servers.map(&:unicast_stream).reduce(&:+)
When things are in order, the include?
method works because it's operating on BoltServerAddress
. Somehow, this BoltServerAddress
is getting double wrapped by Set
somewhere
Actually, I think it gets double wrapped by Set
here:
https://github.com/neo4jrb/neo4j-ruby-driver/blob/4.4/ruby/neo4j/driver/internal/cluster/rediscovery_impl.rb#L94
def lookup_on_initial_router(routing_table, connection_pool, seen_servers, bookmark, impersonated_user, base_error)
resolved_routers = resolve
(resolved_routers - seen_servers.to_a).lazy.filter_map do |address|
lookup_on_router(address, false, routing_table, connection_pool, nil, bookmark,
impersonated_user, base_error)
end.first&.then { |composition| ClusterCompositionLookupResult.new(composition, Set.new(resolved_routers)) }
end
resolved_routers
is a an Array of Set
wrapped BoltServerAddresses
, so this method returns one Set
of individually wrapped BoltServerAddresses
When the routing table gets refreshed in RoutingTableHandlerImpl#fresh_cluster_composition_fetched
, it combines the addresses_retain
with this Set
of individually wrapped BoltServerAddresses
here
https://github.com/neo4jrb/neo4j-ruby-driver/blob/4.4/ruby/neo4j/driver/internal/cluster/routing_table_handler_impl.rb#L66
composition_lookup_result.resolved_initial_routers&.then do |addresses|
addresses_to_retain << addresses
end
@connection_pool.retain_all(addresses_to_retain)
The call to retain_all then tries to walk through addresses_to_retain
, where it encounters the Set
wrapped BoltServerAddresses
@klobuczek , can we get rid of Set
enclosing resolved_routers
on this line?
https://github.com/neo4jrb/neo4j-ruby-driver/blob/610e61126e984244e2be05836b3d2628cc57ce03/ruby/neo4j/driver/internal/cluster/rediscovery_impl.rb#L94
It looks like someone added a flatten to the addresses_to_retain
method that is causing the issue here in this commit in a fork:
It looks like someone added a flatten to the
addresses_to_retain
method that is causing the issue here in this commit in a fork:
That would work too
That change to the ascent-technologies
fork is from us. Frankly, I forgot we made it. We've been running that change in production now for quite some time without seeing the routing table error.
Thanks for the fork! This looks like it'll fix things.
Our team is just getting started using Neo4J and am curious as to the general problem here. Are you running some sort of configuration that caused more stale objects to get rehydrated that caused this issue? I get why this would fix it, but more wondering about underlying causes here
I don't think there's anything special about our configuration. We're using ActiveGraph as the ORM to back an API powered by GraphQL and using AuraDB as the storage mechanism. The access of the API is pretty bursty. From what I can remember, there wasn't any particular pattern behind when/why we would see this issue.
I tried to write a PR for this issue that uses https://github.com/ascent-technologies/neo4j-ruby-driver/commit/6f80bfaa919f02a419a61a52001b09206c465126 but I was unable to push my branch to the remote. I was running into authentication issues so maybe the repo isn't open to PRs?
Either way, it would be appreciated if @klobuczek could take a look at the fork and push a change that fixes this. It would be greatly appreciated!
@klobuczek is there any plan to include this fix into main repo?
After a period of moderate inactivity, we began seeing this error on all subsequent requests. Restarting our instances temporarily fixed the issue. From the look of things, the key used in
address_to_pool
forNeo4j::Driver::Internal::Async::Pool::ConnectionPoolImpl
is somehow replaced with a Set instead of an address.Docker image:
ruby:3.1.2
Gems:
Error: