Open andreaswittig opened 1 year ago
+1 for this feature
(some documentation of best practices on manual scale up process would be nice too)
Based on several days of working with the three services, one can do an HA and auto-scaling configuration out of the box if one sets AutoScaling
to true, and sets the DesiredCount
, MaxCapacity
, and MinCapacity
. The only service that doesn't scale well is the sidekiq service. According to this page https://docs.joinmastodon.org/admin/scaling/#sidekiq, you can have multiple sidekiq services on different queues, except for the scheduler queue. There can only be one of those. My fork has a few of these changes already in the istoleyourpw-deploy
branch: https://github.com/compuguy/mastodon-on-aws
Edit: Came across this article (https://nora.codes/post/scaling-mastodon-in-the-face-of-an-exodus/), it explains how to split up the sidekiq tasks. Can have multiple instances with the default, push, and pull
queues, and have one instance for mailer and scheduler.
My Sidekiq task is regularly pegging at 100% CPU utilization... definitely need some guidance on configuring scaling...
@scrappydog Same for us. I'm not sure if that is an issue. It likely doesn't matter if the background tasks utilize all resources as long as they finish withou much delay. For us, we see spikes to 100% but only for minutes. Do you see the same pattern?
That looks very similar to utilization on my instance.
My inner system admin really "wants" to add another task... but I agree as long as jobs are completing in a reasonable time it's not an immediate issue.
BUT we are running tiny instances for testing... we NEED a way to scale up... :-)
I bumped the CPU allocation up on the Sidekiq task to CPU .5 vCPU | Memory 3 GB...
This feels happier for now... but it doesn't address the real scalability question...
Upgraded about half way through this graph... definably a lot better!
I opened up #20 for sidekiq. This issue is about auto-scaling for web and streaming API.
Enabling auto-scaling is not the big deal here. What we need is a good metric to trigger scale out/in. And we need a test workload to test tis with. I have no idea how we can simulate mastodon load. If anyone here is reading this running an instance with enough users to benefit rom auto-scaling please let us know.
Just add a relay server and you will have CPU load in a minute.
I opened up #20 for sidekiq. This issue is about auto-scaling for web and streaming API.
Enabling auto-scaling is not the big deal here. What we need is a good metric to trigger scale out/in. And we need a test workload to test tis with. I have no idea how we can simulate mastodon load. If anyone here is reading this running an instance with enough users to benefit rom auto-scaling please let us know.
Yeah it's quite easy to autoscale the web and streaming API's. But for most people it's #20 that's more important since Sidekiq does most of the heavy lifting for Mastodon...
Evaluate and implement auto-scaling for ECS services web and streaming API.