Enable auto-scaling for web and streaming API

widdix / mastodon-on-aws

Host your own Mastodon instance on AWS

https://cloudonaut.io/mastodon-on-aws/

134 stars 27 forks source link

Enable auto-scaling for web and streaming API #1

Open andreaswittig opened 1 year ago

andreaswittig commented 1 year ago

Evaluate and implement auto-scaling for ECS services web and streaming API.

scrappydog commented 1 year ago

+1 for this feature

(some documentation of best practices on manual scale up process would be nice too)

compuguy commented 1 year ago

Based on several days of working with the three services, one can do an HA and auto-scaling configuration out of the box if one sets AutoScaling to true, and sets the DesiredCount, MaxCapacity, and MinCapacity. The only service that doesn't scale well is the sidekiq service. According to this page https://docs.joinmastodon.org/admin/scaling/#sidekiq, you can have multiple sidekiq services on different queues, except for the scheduler queue. There can only be one of those. My fork has a few of these changes already in the istoleyourpw-deploy branch: https://github.com/compuguy/mastodon-on-aws

Edit: Came across this article (https://nora.codes/post/scaling-mastodon-in-the-face-of-an-exodus/), it explains how to split up the sidekiq tasks. Can have multiple instances with the default, push, and pull queues, and have one instance for mailer and scheduler.

scrappydog commented 1 year ago

My Sidekiq task is regularly pegging at 100% CPU utilization... definitely need some guidance on configuring scaling...

michaelwittig commented 1 year ago

@scrappydog Same for us. I'm not sure if that is an issue. It likely doesn't matter if the background tasks utilize all resources as long as they finish withou much delay. For us, we see spikes to 100% but only for minutes. Do you see the same pattern?

scrappydog commented 1 year ago

That looks very similar to utilization on my instance.

My inner system admin really "wants" to add another task... but I agree as long as jobs are completing in a reasonable time it's not an immediate issue.

BUT we are running tiny instances for testing... we NEED a way to scale up... :-)

scrappydog commented 1 year ago

I bumped the CPU allocation up on the Sidekiq task to CPU .5 vCPU | Memory 3 GB...

This feels happier for now... but it doesn't address the real scalability question...

scrappydog commented 1 year ago

Upgraded about half way through this graph... definably a lot better!

michaelwittig commented 1 year ago

I opened up #20 for sidekiq. This issue is about auto-scaling for web and streaming API.

Enabling auto-scaling is not the big deal here. What we need is a good metric to trigger scale out/in. And we need a test workload to test tis with. I have no idea how we can simulate mastodon load. If anyone here is reading this running an instance with enough users to benefit rom auto-scaling please let us know.

nodomain commented 1 year ago

Just add a relay server and you will have CPU load in a minute.

https://github.com/brodi1/activitypub-relays

compuguy commented 1 year ago

I opened up #20 for sidekiq. This issue is about auto-scaling for web and streaming API.

Enabling auto-scaling is not the big deal here. What we need is a good metric to trigger scale out/in. And we need a test workload to test tis with. I have no idea how we can simulate mastodon load. If anyone here is reading this running an instance with enough users to benefit rom auto-scaling please let us know.

Yeah it's quite easy to autoscale the web and streaming API's. But for most people it's #20 that's more important since Sidekiq does most of the heavy lifting for Mastodon...