neo4jrb / activegraph

An active model wrapper for the Neo4j Graph Database for Ruby.
http://neo4jrb.io
MIT License
1.39k stars 276 forks source link

neo4j.core.bolt.request activesupport instrumentation is incorrect #1647

Open yourpalal opened 3 years ago

yourpalal commented 3 years ago

The neo4j.core.bolt.requestevent duration seems to generally be very short, often 0ms. This is even the case when running queries that take multiple seconds to run.

Unfortunately, this makes it very hard to get an idea of how neo4j queries are performing, what queries are slow, etc.

My guess at what's wrong here is that it's measuring the time to send the query and get some response, but not to get the full response data. I could be wrong though. I tried to dig into this but I'm not familiar at all with neo4j-ruby-driver and seabolt, so I wasn't able to make much progress.

I was however able to confirm:

example

Below is the timing information for a query being run in the neo4j web interface. This query is shown in the log as taking 0ms. In both cases, the neo4j server is being run on my machine. The query itself is limited to 500 returned entities, but does basically a full scan of my neo4j db.

image

Runtime information:

versions:

    activegraph (10.1.0)
    neo4j-ruby-driver (1.7.4)
    activesupport (6.1.3)

    ruby 3.0.0 (mri)

neo4j: 4.0.0

yourpalal commented 3 years ago

@klobuczek do you have any ideas on where to start looking in order to fix this? I have a feeling that improving the instrumentation will probably involve getting a better idea of how the driver works?

If you can provide some pointers on where to look I will put together a PR.

yourpalal commented 2 years ago

I noticed that the run handlers in the neo4j-ruby-driver classes have access to a summary object with nice run-time info that appears to come straight from the driver, or maybe from the server.

The run handlers also have a lifecycle that allows for recognizing when a query has actually finished pulling all the related data.

Using this, I did a little monkey patching in a rails initializer to hack together a proof-of-concept!

class Neo4j::Driver::Internal::Handlers::TransactionPullAllResponseHandler
  def after_success(_)
    # instrument new event - finished queries
    ActiveSupport::Notifications.instrument('neo4j.core.bolt.result', summary)
  end
end

module ActiveGraph
  module Base
    def self.subscribe_to_request(&logger_proc)
      super(&logger_proc)

      ActiveSupport::Notifications.subscribe('neo4j.core.bolt.result') do |_, start, finish, _id, summary|
        query = summary.statement.text
        available = summary.result_available_after
        consumed = summary.result_consumed_after

        yield "#{ANSI::RED}#{query}#{ANSI::CLEAR} #{ANSI::BLUE}BOLT:#{ANSI::CLEAR} #{ANSI::YELLOW}#{available}ms + #{consumed}ms#{ANSI::CLEAR}"
      end
    end
  end
end

Obviously this isn't perfect, but I think it shows that there's good information to be had that can be used for instrumentation!

This also confirmed that the current instrumentation numbers don't tell the whole story. In any substantial query, the durations logged with my new instrumentation are longer than those given by the standard neo4j.core.bolt.request instrumentation.

@klobuczek if you can provide a pointer on what the ideal form of this would be I can clean it up and turn it into a nice PR.