thoughtbot / suspenders

A Rails template with our standard defaults.
https://thoughtbot.com
MIT License
4.01k stars 530 forks source link

Random Rack::Timeout::RequestTimeoutError exceptions in Heroku #623

Closed tute closed 8 years ago

tute commented 9 years ago

Moving in a ticket from Heroku to open up discussion to more developers seeing this same behavior.

Rack::Timeout::RequestTimeoutError:
Request waited DELTAms, then ran for longer than 10000ms

6 suspenders applications in Heroku get random timeouts in different parts of the software: (staging.)thoughtbot.com, robots.thoughtbot.com, (staging.)hotshotlegal.com, formkeep.com.

At least 5 of those apps (not sure about FormKeep) don't use external services more than the database, and background emails. Some are new, clean, smallish projects with low traffic, so I can't explain why would it ever take 10 seconds to respond (slowest requests take less than a second in HotshotLegal as of today).

Past week we haven't been deploying HotshotLegal, and FormKeep rarely is deployed, so a first hypothesis I had about requests starting during a deploy doesn't hold anymore.

Why does this happen? How can we debug this?

I attach some of the many stack traces to show how random it seems.


Rack::Timeout::RequestTimeoutError: Request waited 165ms, then ran for longer than 10000ms

vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/core_ext/class/attribute.rb:86 in block (4 levels) in class_attribute
vendor/bundle/ruby/2.2.0/gems/actionview-4.2.4/lib/action_view/view_paths.rb:60 in lookup_context
vendor/bundle/ruby/2.2.0/gems/actionview-4.2.4/lib/action_view/rendering.rb:29 in process
vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.4/lib/action_controller/metal.rb:196 in dispatch
vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.4/lib/action_controller/metal/rack_delegation.rb:13 in dispatch
vendor/bundle/ruby/2.2.0/gems/actionpack-4.2.4/lib/action_controller/metal.rb:237 in block in action

On a define_method: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/class/attribute.rb#L86.


Rack::Timeout::RequestTimeoutError: Request waited 37ms, then ran for longer than 10000ms

vendor/ruby-2.2.3/lib/ruby/2.2.0/psych/visitors/emitter.rb:10 in new
vendor/ruby-2.2.3/lib/ruby/2.2.0/psych/visitors/emitter.rb:10 in initialize
vendor/ruby-2.2.3/lib/ruby/2.2.0/psych/nodes/node.rb:48 in new
vendor/ruby-2.2.3/lib/ruby/2.2.0/psych/nodes/node.rb:48 in yaml
vendor/ruby-2.2.3/lib/ruby/2.2.0/psych.rb:410 in dump
vendor/ruby-2.2.3/lib/ruby/2.2.0/psych/core_ext.rb:14 in psych_to_yaml
vendor/bundle/ruby/2.2.0/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:81 in payload_object=
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/attribute_assignment.rb:54 in public_send
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/attribute_assignment.rb:54 in _assign_attribute
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/attribute_assignment.rb:41 in block in assign_attributes
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/attribute_assignment.rb:35 in each
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/attribute_assignment.rb:35 in assign_attributes
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/core.rb:564 in init_attributes
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/core.rb:281 in initialize
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/inheritance.rb:61 in new
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/inheritance.rb:61 in new
vendor/bundle/ruby/2.2.0/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:32 in enqueue
vendor/bundle/ruby/2.2.0/gems/activejob-4.2.4/lib/active_job/queue_adapters/delayed_job_adapter.rb:18 in enqueue
vendor/bundle/ruby/2.2.0/gems/activejob-4.2.4/lib/active_job/enqueuing.rb:71 in block in enqueue

On a @handler = Psych::Emitter.new io: https://github.com/tenderlove/psych/blob/master/lib/psych/visitors/emitter.rb#L10.


Rack::Timeout::RequestTimeoutError: Request waited 239ms, then ran for longer than 10000ms

vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/visitor.rb:16 in block in dispatch_cache
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/reduce.rb:13 in yield
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/reduce.rb:13 in visit
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/to_sql.rb:787 in maybe_visit
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/to_sql.rb:233 in visit_Arel_Nodes_SelectCore
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/to_sql.rb:210 in block in visit_Arel_Nodes_SelectStatement
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/to_sql.rb:209 in each
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/to_sql.rb:209 in inject
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/to_sql.rb:209 in visit_Arel_Nodes_SelectStatement
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/reduce.rb:13 in visit
vendor/bundle/ruby/2.2.0/gems/arel-6.0.3/lib/arel/visitors/reduce.rb:7 in accept
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/connection_adapters/abstract/database_statements.rb:12 in to_sql
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/connection_adapters/abstract/query_cache.rb:67 in select_all
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/querying.rb:39 in find_by_sql

Rack::Timeout::RequestTimeoutError: Request waited 0ms, then ran for longer than 10000ms

vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/core_ext/string/output_safety.rb:167 in safe_concat
app/views/topics/_course_list.html.erb:10 in block in _app_views_topics__course_list_html_erb__723328469013061476_69826776019600
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/relation/delegation.rb:46 in each
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/relation/delegation.rb:46 in each
app/views/topics/_course_list.html.erb:1 in _app_views_topics__course_list_html_erb__723328469013061476_69826776019600
vendor/bundle/ruby/2.2.0/gems/actionview-4.2.4/lib/action_view/template.rb:145 in block in render
vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/notifications.rb:164 in block in instrument
vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/notifications/instrumenter.rb:20 in instrument
vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/notifications.rb:164 in instrument
vendor/bundle/ruby/2.2.0/gems/actionview-4.2.4/lib/action_view/template.rb:333 in instrument
vendor/bundle/ruby/2.2.0/gems/actionview-4.2.4/lib/action_view/template.rb:143 in render

Rack::Timeout::RequestTimeoutError: Request waited 115ms, then ran for longer than 10000ms

vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/has_many_association.rb:8 in <module:Associations>
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/has_many_association.rb:3 in <module:ActiveRecord>
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/has_many_association.rb:1 in <top (required)>
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/has_many_through_association.rb:6 in <module:Associations>
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/has_many_through_association.rb:5 in <module:ActiveRecord>
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/has_many_through_association.rb:3 in <top (required)>
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/reflection.rb:447 in association_class
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/reflection.rb:877 in association_class
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations.rb:162 in association
vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.4/lib/active_record/associations/builder/association.rb:115 in topics
app/views/superadmin/organizations/index.html.erb:19 in block in _app_views_superadmin_organizations_index_html_erb__597677509313507384_69826780730740
app/views/superadmin/organizations/index.html.erb:15 in each
app/views/superadmin/organizations/index.html.erb:15 in _app_views_superadmin_organizations_index_html_erb__597677509313507384_69826780730740
vendor/bundle/ruby/2.2.0/gems/actionview-4.2.4/lib/action_view/template.rb:145 in block in render
vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/notifications.rb:164 in block in instrument

cc @jferris @seanpdoyle @reshleman @geoffharcourt

tute commented 9 years ago

@reshleman are your apps suspenders based, or other Rails or Ruby apps?

jferris commented 9 years ago

Looks like this occurs both on Puma (thoughtbot.com, Hotshot, Formkeep) and Unicorn (robots.thoughtbot.com), so it doesn't seem to be a threading issue.

reshleman commented 9 years ago

@tute I see these in a handful of suspenders-based Rails apps (w/ Unicorn), none of which use external services.

I find that the stacktraces created by rack-timeout exceptions aren't very helpful, even for requests that are consistently slow, because the exceptions aren't always raised during the slow method / database query.

Instead, when I see these exceptions in Airbrake, I usually look at the controller action they occur in. If I see a pattern of timeouts occurring in a certain controller action, I'll use rack-mini-profiler to look for bottlenecks or slow queries.

In most cases, though, the timeouts seem to occur randomly, as you're experiencing.

Happy to help out with additional research / data.

geoffharcourt commented 9 years ago

In my experience I saw these timeout exceptions occur as often for static homepage requests (no DB access required) as for more complicated page or API requests. I had always assumed this was just a side-effect of working in a shared hosting environment.

jferris commented 9 years ago

I had always assumed this was just a side-effect of working in a shared hosting environment.

It seems unlikely (and unacceptable) that Heroku would take 10 seconds to serve a static page under any circumstances.

jferris commented 9 years ago

Possible theories from the discussion so far:

tute commented 9 years ago

Affected applications may be because of throttling from going over the Heroku memory quota

I just upgraded to professional dynos, and metrics show that we are using on average 603MB of RAM, with a peak maximum of 679MB. This is standard suspenders with few added gems.

tute commented 9 years ago

Memory quota exceeded. There have been 3632 memory errors in the last day. Try larger dynos or reducing your resource consumption on the dyno.

jferris commented 9 years ago

This is standard suspenders with few added gems.

Sounds like we have an issue with our configuration in Suspenders. Suspenders apps are expected to be pretty small, so using more than 600MB of RAM for a newish Suspenders app seems unreasonable.

tute commented 9 years ago

Suspenders apps are expected to be pretty small, so using more than 600MB of RAM for a newish Suspenders app seems unreasonable.

Should we lower concurrency by 1? Are there other lines to follow?

jferris commented 9 years ago

Should we lower concurrency by 1? Are there other lines to follow?

We currently use a concurrency of 3. If we're using 600MB of RAM for a fresh Rails app, I think that means we're saying it's reasonable to use ~200MB per process, which seems high to me.

derekprior commented 9 years ago

How about a binary search of gems in the project to see if we can find the culprit. Comment out half of the gems (and associated initializers, etc), restart the app, and see what develops?

tute commented 9 years ago

How about a binary search of gems in the project to see if we can find the culprit. Comment out half of the gems (and associated initializers, etc), restart the app, and see what develops?

Will work on this next Friday, unless someone gets to it before me.

derekprior commented 9 years ago

If you can get your app to boot into production mode locally, you can use Derailed Benchmarks's various memory profiling tasks to help narrow things down, possibly.

reshleman commented 9 years ago

Affected applications may be because of throttling from going over the Heroku memory quota

FWIW, I encounter these timeouts on apps that don't exceed the memory quota on 1x dynos or produce any R14 errors.

tute commented 8 years ago

I haven't seen this again. Let me know if you still see it in newer versions of suspenders. Thank you!

dmurawsky commented 3 years ago

@tute What did you do to fix this?