Open jrochkind opened 7 months ago
WEB_CONCURRENCY=2 RAILS_MAX_THREADS=5
Memory consumption was good, but Puma worker usage was maxing out in a way we didn't expect.
On Jan 16 2023, we changed back to WEB_CONCURRENCY=3 RAILS_MAX_THREADS=5
Oops, this is not what we meant to do, 3x5 instead of the original 3x3!
After a week of that... RAM usage is indeed a lot worse (and we got one R14), puma pool usage certainly looks better cause we now we have 15 workers instead of 9-10 total, but that's also part of why we are using so much RAM!
OK on Jan 24 changing to what we actually MEANT to change to for comparison with our original setup....
heroku config:set WEB_CONCURRENCY=3 RAILS_MAX_THREADS=3 -r production
on week ending 2/1/2024
The memory is indeed more than WEB_CONCURRENCY=2 RAILS_MAX_THREADS=5
, as expected -- I feel like memory obeservations are pretty completely matching what we expect.
But it looks like this one maxes out puma workers less than our original 2x5 test... I don't know if that was an anomaly? OK, let's try one more week of 2x5
WEB_CONCURRENCY=2 RAILS_MAX_THREADS=5
, starting 1 feb 2024
LOTS of reading on tuning thread count: https://github.com/rails/rails/issues/50450
Suggests that more than 3 threads per worker may be counter-productive, especially under very heavy load, perhaps justifying observations that 2 processes x 5 threads was leading to spikes of bad responses.
Rails maintainers are actually changing default thread count to 3 from 5 -- and existing heroku doc recommendations to use 5 may not have been optimal after all.
3x3 may be optimal after all... or if we want to see if we can get away with less processes to save RAM, maybe 2 processes by 3 threads might not perform much worse than 3x3 after all? With only two hyperthreads available, our extra 3rd process may not be giving us much.
(nate berkopec also does say that one process per vCPU/hyperthread is good, hyperthreads really do perform like cores, no reason to limit to one per physical core -- heroku docs suggest one per physical core, heroku docs may not actually be entirely reliable current best practices. More info at https://mailchi.mp/railsspeed/how-many-ruby-processes-per-cpu-is-ideal?e=e9606cf04b )
OK, so I kind of forgot I had left the app in experimental 2 processes by 5 threads each mode. Who knows if that was responsible for some of our outages over the past couple months -- although I tend to think that when we are overloaded with traffic, we are overloaded with traffic, and that's it.
But I'm going to post the current metric graphs for last week of 2x5.
Then, based on new/solidified information above -- that 5 is too many threads, but more than two processes probably isn't helpful... I'm going to try 2x4.
Week ending Apr 24.
WEB_CONCURRENCY: 2
RAILS_MAX_THREADS: 5
That looks fine? even though we are at 2x5? And has memory to spare nicely, which hopefully would let us turn on YJIT on 3.3.
There are definitely some spikes of bad performance, but I think we get these regardless of settings, when we just don't have enough resources for load.
heroku config:set WEB_CONCURRENCY=2 RAILS_MAX_THREADS=4 -r production
OK, we are definitely regularly maxing out our puma workers --- we think because of increased (bot) traffic. We think those two configurations being tested above didn't necessarly make that much difference.
Here's a 7 day chart with a performance-m
dyno with WEB_CONCURRENCY 2 processes, and RAILS_MAX_THREADS 4 threads.
Lotta maxed out workers.
Upgrading to performance-l
dyno, to give us more actual CPU capacity to handle more traffic. And enough RAM (more per CPU than before) that we should have plenty to spare to later move to Ruby 3.3 YJIT with increased RAM consumption.
On May 28th we switched to performance-l, with 8 WEB_CONCURRENCY workers, and 3 RAILS_MAX_THREADS threads per worker (which we believe is the new Rails suggested default RAILS_MAX_THREADS).
As a result of investigations of memory usage at #2449, I realized we maybe should explore changing our worker/thread counts on our single performance-m dyno.
from currently 3 workers x 3 threads to 2 workers x 4 or 5 threads -- closer to heroku standard recommendations
I think this should give us lower RAM usage with pretty similar performance profile, not actually giving maybe anything up.
But in order to only change one thing at a time to have a better idea of what had an effect, we're going to hold off on this, and start with patching the memory fix to Rails in, come back to this later to explore it separately when everything has settled down.