Postgres and MySQL explain plan traces missing when agent is used with Rails v7.2+

gsar commented 1 week ago

Description

We appear to be missing EXPLAIN traces for postgres queries with our instance.

Also seeing lots of these "undefined method postgresql_connection" being logged with ruby 3.3.5, rails 7.2.1.2 and newrelic_rpm 9.14.0.

Expected Behavior

There should be no errors in the logs.

Your Environment

See above for relevant version numbers. No changes in the environment were made besides upgrading ruby, rails and newrelic_rpm gem.

For Maintainers Only or Hero Triaging this bug

Suggested Priority (P1,P2,P3,P4,P5): Suggested T-Shirt size (S, M, L, XL, Unknown):

workato-integration[bot] commented 1 week ago

https://new-relic.atlassian.net/browse/NR-333320

gstark commented 1 week ago

The issue appears to be that get_explain_plan expects there a method named postgresql_connetion. In our case it expects a mysql2_connection method. This appears to have been removed in the ActiveRecord 7.2 baseline.

See agent/instrumentation/active_record_subscriber.rb

def get_explain_plan(statement)
  connection = NewRelic::Agent::Database.get_connection(statement.config) do
    ::ActiveRecord::Base.send("#{statement.config[:adapter]}_connection",
      statement.config)
  end

I'm not an ActiveRecord expert but perhaps ActiveRecord::Base.lease_connection would work here?

The docs state:

# Returns the connection currently associated with the class. This can
# also be used to "borrow" the connection to do database work unrelated
# to any of the specific Active Records.
# The connection will remain leased for the entire duration of the request
# or job, or until +#release_connection+ is called.

It does appear the NewRelic agent tries to cache these connections itself, so there may be other options than leasing. Unless get_explain_plan is updated to simply lease the connection, perform the explain, and then release the connection. In that case with_connection might be a better approach since the explain can be done inside the block yielded to with_connection

tannalynn commented 1 week ago

Thank you both bringing this to our attention, and for the detailed information about the problem. It does seem that this was something that was overlooked in our rails 7.2 support, so we will need to take a look at updating our instrumentation here to prevent this error in the future.

fallwith commented 1 day ago

Here's the relevant commit that removed the methods the agent has been relying on: https://github.com/rails/rails/commit/009c7e74117690f0dbe200188a929b345c9306c1

fallwith commented 1 day ago

The options that @gstark outlined seem great.

For a 1:1 fix compatible with existing behavior, lease_connection seems like a solid option. The leased connection option seems to mutate into the result (true) once release_connection is called and will effectively remain leased until that release call is made. So we could put in a check for lease_connection being defined and cache the leased connection and then check for the connection having been released to refresh the lease before performing any queries.
Refactoring the existing logic to move away from a cached connection and start making use of with_connection (when a new enough Rails version is present to offer it) is a terrific idea. On the one hand, having the agent lease and hold on to a connection indefinitely seems downright wasteful when with_connection is available. However, having the agent grab that connection at app start-up time ensures that the connection is available when needed. I'm not sure what calling with_connection on an as-needed basis will do if the observed app is currently at its quota of max available connections.

@ioquatix's Permanent Connection Checkout gist highlights the usage options nicely and even demonstrates invoking lease_connection within a with_connection block.

There are alternative approaches that I'm seeing to essentially still manually instantiate an adapter specific class and cache it, but I think when Rails v7.2+ is in play we should leverage with_connection and/or lease_connection as they seem the most appropriate for what the agent is attempting to do with performing explain plan queries.

newrelic / newrelic-ruby-agent