Closed tute closed 8 years ago
@reshleman are your apps suspenders based, or other Rails or Ruby apps?
Looks like this occurs both on Puma (thoughtbot.com, Hotshot, Formkeep) and Unicorn (robots.thoughtbot.com), so it doesn't seem to be a threading issue.
@tute I see these in a handful of suspenders-based Rails apps (w/ Unicorn), none of which use external services.
I find that the stacktraces created by rack-timeout
exceptions aren't very helpful, even for requests that are consistently slow, because the exceptions aren't always raised during the slow method / database query.
Instead, when I see these exceptions in Airbrake, I usually look at the controller action they occur in. If I see a pattern of timeouts occurring in a certain controller action, I'll use rack-mini-profiler
to look for bottlenecks or slow queries.
In most cases, though, the timeouts seem to occur randomly, as you're experiencing.
Happy to help out with additional research / data.
In my experience I saw these timeout exceptions occur as often for static homepage requests (no DB access required) as for more complicated page or API requests. I had always assumed this was just a side-effect of working in a shared hosting environment.
I had always assumed this was just a side-effect of working in a shared hosting environment.
It seems unlikely (and unacceptable) that Heroku would take 10 seconds to serve a static page under any circumstances.
Possible theories from the discussion so far:
Affected applications may be because of throttling from going over the Heroku memory quota
I just upgraded to professional dynos, and metrics show that we are using on average 603MB of RAM, with a peak maximum of 679MB. This is standard suspenders with few added gems.
Memory quota exceeded. There have been 3632 memory errors in the last day. Try larger dynos or reducing your resource consumption on the dyno.
This is standard suspenders with few added gems.
Sounds like we have an issue with our configuration in Suspenders. Suspenders apps are expected to be pretty small, so using more than 600MB of RAM for a newish Suspenders app seems unreasonable.
Suspenders apps are expected to be pretty small, so using more than 600MB of RAM for a newish Suspenders app seems unreasonable.
Should we lower concurrency by 1? Are there other lines to follow?
Should we lower concurrency by 1? Are there other lines to follow?
We currently use a concurrency of 3. If we're using 600MB of RAM for a fresh Rails app, I think that means we're saying it's reasonable to use ~200MB per process, which seems high to me.
How about a binary search of gems in the project to see if we can find the culprit. Comment out half of the gems (and associated initializers, etc), restart the app, and see what develops?
How about a binary search of gems in the project to see if we can find the culprit. Comment out half of the gems (and associated initializers, etc), restart the app, and see what develops?
Will work on this next Friday, unless someone gets to it before me.
If you can get your app to boot into production mode locally, you can use Derailed Benchmarks's various memory profiling tasks to help narrow things down, possibly.
Affected applications may be because of throttling from going over the Heroku memory quota
FWIW, I encounter these timeouts on apps that don't exceed the memory quota on 1x dynos or produce any R14 errors.
I haven't seen this again. Let me know if you still see it in newer versions of suspenders. Thank you!
@tute What did you do to fix this?
Moving in a ticket from Heroku to open up discussion to more developers seeing this same behavior.
6 suspenders applications in Heroku get random timeouts in different parts of the software: (staging.)thoughtbot.com, robots.thoughtbot.com, (staging.)hotshotlegal.com, formkeep.com.
At least 5 of those apps (not sure about FormKeep) don't use external services more than the database, and background emails. Some are new, clean, smallish projects with low traffic, so I can't explain why would it ever take 10 seconds to respond (slowest requests take less than a second in HotshotLegal as of today).
Past week we haven't been deploying HotshotLegal, and FormKeep rarely is deployed, so a first hypothesis I had about requests starting during a deploy doesn't hold anymore.
Why does this happen? How can we debug this?
I attach some of the many stack traces to show how random it seems.
On a
define_method
: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/class/attribute.rb#L86.On a
@handler = Psych::Emitter.new io
: https://github.com/tenderlove/psych/blob/master/lib/psych/visitors/emitter.rb#L10.cc @jferris @seanpdoyle @reshleman @geoffharcourt