heroku / barnes

Tell StatsD about request time, GC, objects and more. Latest Rails 4 and Ruby 2.1 support, and ancient Rails 2 and Ruby 1.8 support.
MIT License
0 stars 0 forks source link

Is the `before_fork` block needed for Rails apps? #36

Open chancancode opened 3 years ago

chancancode commented 3 years ago

The Heroku docs seem to suggest yes, but the README of the gem seems to imply that it is only needed for non-Rails apps.

chancancode commented 3 years ago

I found a related issue while investigating this. When not using puma in worker mode, the before_fork does not run, as expected and as-per the documentation. However, I would have expected the railtie/initializer to kick in and starts Barnes without further configuration, which is also what the README implies.

In practice, that didn't seem to work: because barnes has no dependency on 'rails' or 'railtie', bundler (or whatever is requiring the gems) could and in practice will require barnes before railtie is required.

So, when this line is executed, ::Rails::Railtie is often not defined yet, so the railtie isn't required, and so it doesn't start automatically.

Back to my original question – had this worked correctly, wouldn't adding the code to before_fork, as per the Heroku docs, result in this being started twice?

doutatsu commented 3 years ago

I'd love to know more as well, as I was looking into adding this, but docs have confused me... I am also not sure if I need to wrap everything inside puma.rb into before_fork or just at the end...

schneems commented 3 years ago

Is the before_fork block needed for Rails apps? #36

I'm honestly not sure. I would have to test it. So I did...

$ mktmpdir
$ git clone https://github.com/heroku/ruby-getting-started
$ cd ruby-getting-started/
$ echo "gem 'barnes'" >> Gemfile
$ chruby 2.7
$ bundle update --bundler
$ gem install ffi
$ gem install puma
$ bundle update ffi puma
$ git add .; git commit -mbarnes
$ heroku create
$ heroku labs:enable "runtime-heroku-metrics"
$ heroku labs:enable "ruby-language-metrics"
$ heroku buildpacks:add -i 1 heroku/metrics
$ git push heroku
$ heroku ps:scale web=1:standard-1x

This gave me an app with metrics:

So I don't think it's required to use before_fork incantation with rails apps. You can see pool.capacity being emitted in a 1 off dyno as well:

$ heroku run bash
Running bash on ⬢ radiant-reaches-02228... up, run.6213 (Standard-1X)
~ $ BARNES_DEBUG=1 DYNO="web.1" bundle exec puma -p ${PORT:-3000}
[6] Puma starting in cluster mode...
[6] * Puma version: 5.1.1 (ruby 2.6.6-p146) ("At Your Service")
[6] *  Min threads: 5
[6] *  Max threads: 5
[6] *  Environment: production
[6] *   Master PID: 6
[6] *      Workers: 2
[6] *     Restarts: (✔) hot (✖) phased
[6] * Preloading application
[6] * Listening on http://0.0.0.0:22722
[6] ! WARNING: Detected 2 Thread(s) started in app boot:
[6] ! #<Thread:0x0000560a7877d3b8@/app/vendor/bundle/ruby/2.6.0/gems/barnes-0.0.8/lib/barnes/periodic.rb:39 sleep> - /app/vendor/bundle/ruby/2.6.0/gems/barnes-0.0.8/lib/barnes/periodic.rb:48:in `sleep'
[6] ! #<Thread:0x0000560a7a60e6b0@/app/vendor/bundle/ruby/2.6.0/gems/activerecord-5.2.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:299 sleep> - /app/vendor/bundle/ruby/2.6.0/gems/activerecord-5.2.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:301:in `sleep'
[6] Use Ctrl-C to stop
[6] - Worker 0 (PID: 10) booted, phase: 0
[6] - Worker 1 (PID: 12) booted, phase: 0
{"barnes.state":{"stopwatch":{"wall":1611355819514.8418,"cpu":3945.501638},"ruby_gc":{"count":49,"heap_allocated_pages":649,"heap_sorted_length":649,"heap_allocatable_pages":0,"heap_available_slots":264531,"heap_live_slots":210569,"heap_free_slots":53962,"heap_final_slots":0,"heap_marked_slots":197329,"heap_eden_pages":649,"heap_tomb_pages":0,"total_allocated_pages":649,"total_freed_pages":0,"total_allocated_objects":1271273,"total_freed_objects":1060704,"malloc_increase_bytes":2109568,"malloc_increase_bytes_limit":32883343,"minor_gc_count":39,"major_gc_count":10,"remembered_wb_unprotected_objects":2204,"remembered_wb_unprotected_objects_limit":4408,"old_objects":193285,"old_objects_limit":386572,"oldmalloc_increase_bytes":18708672,"oldmalloc_increase_bytes_limit":24159190}},"barnes.counters":{"Time.wall":10013.2392578125,"Time.cpu":1751.171499,"Time.idle":8262.0677588125,"Time.pct.cpu":17.488561432642353,"Time.pct.idle":82.51143856735766,"GC.count":6,"GC.major_count":2,"GC.minor_gc_count":4},"barnes.gauges":{"using.puma":1,"pool.capacity":10,"threads.max":10,"threads.spawned":10,"Objects.TOTAL":264531,"Objects.FREE":54114,"Objects.T_OBJECT":3813,"Objects.T_CLASS":6861,"Objects.T_MODULE":1000,"Objects.T_FLOAT":9,"Objects.T_STRING":94642,"Objects.T_REGEXP":1916,"Objects.T_ARRAY":13082,"Objects.T_HASH":2629,"Objects.T_STRUCT":530,"Objects.T_BIGNUM":14,"Objects.T_FILE":30,"Objects.T_DATA":4698,"Objects.T_MATCH":111,"Objects.T_COMPLEX":1,"Objects.T_RATIONAL":754,"Objects.T_SYMBOL":368,"Objects.T_IMEMO":78316,"Objects.T_ICLASS":1643,"GC.total_allocated_objects":2344434.0,"GC.total_freed_objects":2059542.0,"GC.count":49,"GC.heap_allocated_pages":649,"GC.heap_sorted_length":649,"GC.heap_allocatable_pages":0,"GC.heap_available_slots":264531,"GC.heap_live_slots":210569,"GC.heap_free_slots":53962,"GC.heap_final_slots":0,"GC.heap_marked_slots":197329,"GC.heap_eden_pages":649,"GC.heap_tomb_pages":0,"GC.total_allocated_pages":649,"GC.total_freed_pages":0,"GC.malloc_increase_bytes":2109568,"GC.malloc_increase_bytes_limit":32883343,"GC.minor_gc_count":39,"GC.major_gc_count":10,"GC.remembered_wb_unprotected_objects":2204,"GC.remembered_wb_unprotected_objects_limit":4408,"GC.old_objects":193285,"GC.old_objects_limit":386572,"GC.oldmalloc_increase_bytes":18708672,"GC.oldmalloc_increase_bytes_limit":24159190}}

If you add a before_fork then you'll get 2 outputs for every tick. Which I think means that it's basically sending Heroku 2x the amount of data that is needed. It should be duplicate data which is using CPU and IO resources to send, but it's not negatively impacting the metrics collected.

Instead of having docs that say "do it this way with rails, do it this way without" another alternative is Barnes.start could be idempotent. We could store the instance of Barnes::Periodic somewhere globally and return that when called a second time.

schneems commented 3 years ago

Also sorry on the delayed response. I was out for most of December with childcare duties. GitHub should have an OOO feature that allows someone to sub in to get my notifications. I only saw this thread due to the bump via @doutatsu

doutatsu commented 3 years ago

GitHub should have an OOO feature that allows someone to sub in to get my notifications

I've heard many maintainers ask for that, definitely tough one. But I've heard Github is finally working on it

Thanks for the detailed response. I've deployed myself, without adding before_fork and can confirm, it works as expected

stevenharman commented 9 months ago

I went spelunking and here's what I found.

  1. As pointed out by @chancancode, Barnes does have a Railtie that will start Barnes.start as part of the Rails initialization process.
  2. It's also true that the Railtie is loaded when require 'barnes' happens.
  3. As @schneems mentioned, each call to Barnes.start will create a new instance of Periodic, which spins up a new thread that loops forever and occasionally emits the metrics.

So in a vanilla Rails app like Richard used in his above example, where he only added gem 'barnes' to the Gemfile, and did nothing with the Puma config, we're relying on Bundler to require 'barnes' for us. And it happily does that. I'm not sure how Bundler is determining the order to load Gems, but in Richard's case (and in my own testing w/our 13 year old Rails app), it seems to be loaded after Rails because by the time Barnes tries to load its Railtie, ::Rails::Railtie is defined. So the Railtie sets up the initialization hook, and eventually Barnes.start is called. Huzzah! 🎉

However, as @chancancode points out, it's possible to get into a situation where that's not the case. If for whatever reason, Barnes is loaded first, then ::Rails::Railtie won't be defined, the Rails initialization hook won't be set, and so Barnes won't be started.

One way that can happen is if you require 'barnes' at the top of your config/puma.rb, as recommended in the Heroku docs. The config/puma.rb file is loaded into the Ruby process by the Puma CLI (i.e., bin/puma) way at the start, before Bundler has gone through and started require-ing all Gems. In this case, when Barnes goes to load its Railtie the ::Rails::Railtie constant is not defined (b/c Rails hasn't been loaded yet), so it skips it. But, crucially, since config/puma.rb did a require 'barnes', Rubygems now considers that dependency loaded. So later in the process when Bundler tries to require the Barnes Gem, it's a NOOP as Rubygems considers that dependency already loaded. In this case, nothing ever called Barnes.start. Unless you've done something like what the Heroku and the Barnes docs mention about putting a before_fork block in your config/puma.rb.

Phew… so now we know how it is that Barnes can start even without those before_fork blocks. But we're not done. Because I'm not sure before_fork is actually the right hook to call for Puma.

Back in point 3 (above) I mentioned that each call to Barnes.start also creates a new Thread that loops and actually reports the metrics for the process. That's true if the Railtie started barnes, or if you did it via before_fork. Heck, if both of those happened, you'd end up with two threads, both reporting roughly the same info, but twice as often as you'd want. Exactly as Richard said. And both of those threads are in the parent process, not the child workers.

On fork, only the forking thread continues to the child. No other threads from that process make it to the child. Meaning, none of the Barnes Periodic threads are running in the child processes. So the metrics we're getting are ONLY from the parent Puma process.

In the case of Puma-specific stats, that's actually exactly what we want. We want to collect and report those from the parent. In fact, Puma.stats_hash from a worker process is empty, so Barnes rightfully ignores and doesn't try to report them.

But in the case of memory, GC, etc… we want those metrics from the child worker processes. And we can get them by using a different Puma hook: on_worker_boot.

So this setup looks like:

# Gemfile

gem 'barnes', require: false # Prevent Bundler from loading `Barnes` at the wrong time

# config/puma.rb

require 'barnes'

# Each time a new child worker process boots, start Barnes for it
on_worker_boot do
  Barnes.start
end

# Once the app has fully booted, start Barnes for the parent process
on_booted do
  Barnes.start
end

There is still one hitch here - we're getting the Puma stats we want from the parent, but I'm not sure we want its GC, Memory, etc… metrics. Right? Or maybe we do? We certainly do if running Puma in Single mode. But in Clustered mode… I'm less sure.

I can write up a change to the Barnes docs with new instructions based on single/clustered Puma. And I'm sure I can nudge some Herokai to get those pulled into Devcenter. But my real question is what about that parent process emitting the non-Puma metrics? Is that desirable? Expected?

Let me know, and we'll take it from there.