mastodon / chart

Helm chart for Mastodon deployment in Kubernetes
GNU Affero General Public License v3.0
154 stars 89 forks source link

Added memory requests to be more realistic. #14

Open keskival opened 1 year ago

keskival commented 1 year ago

This makes Kubernetes make better choices about where to schedule the pods, and communicates to the administrators about the minimum sensible resource requirements.

On a single user Mastodon instance on a three node Kubernetes after a week of so use we get these memory uses per pod:

tero@arcones:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)
mastodon-elasticsearch-coordinating-0          6m           403Mi
mastodon-elasticsearch-coordinating-1          28m          189Mi
mastodon-elasticsearch-data-0                  10m          1432Mi
mastodon-elasticsearch-data-1                  5m           1513Mi
mastodon-elasticsearch-ingest-0                6m           418Mi
mastodon-elasticsearch-ingest-1                6m           396Mi
mastodon-elasticsearch-master-0                24m          466Mi
mastodon-elasticsearch-master-1                10m          221Mi
mastodon-postgresql-0                          12m          276Mi
mastodon-redis-master-0                        16m          37Mi
mastodon-redis-replicas-0                      7m           34Mi
mastodon-sidekiq-all-queues-549b4bb7b4-zvj2m   266m         499Mi
mastodon-streaming-78465f778d-6xfg2            1m           96Mi
mastodon-web-774c5c94f9-f5bhz                  22m          418Mi

Hence we make the following adjustments to Bitnami defaults:

And for Mastodon defaults:

The original idea of keeping these requests zero is a good default when minimal requirements are unknown. However, from a single user node we get minimal requirements and having the limits as zero only leads to trouble for people. Of course the system requirements will change over time, but they are chiefly expected to go upwards.

keskival commented 1 year ago

I realize this is an opinionated change, and feel free to close it if you disagree with the rationale. I am in progress of testing this, putting it here already for transparency.

keskival commented 1 year ago

This has been tested as far as template generation goes. The template is generated as one would expect.

Confirmed working in a cluster as well.

deepy commented 1 year ago

single-user instance here as well, but I'm seeing 80Mi for redis and my sidekiq usage is just right next to 700MiB

I think adding these to values.yaml as documentation (commented-out) is a good idea though

keskival commented 1 year ago

@deepy, thanks for an added validation! Commented out suggestions would be fine as well. However, as default values in my opinion these would be better than zeros which they are now.

Requests denote the minimum required for a pod to be scheduled on a node, so the values in this PR would be better than no such requests set for any instance, even yours with slightly higher usages. Of course you might want to tune them up a bit still in your cluster.

Limits are not set, so the pods can take as much memory as they want. The suggested requests don't affect that.

renchap commented 1 year ago

Thanks for this, better defaults would definitely be a good idea.

FYI, here is out current usage for mastodon.online:

mastodon-web: 1300M
mastodon-sidekiq-default-push-ingress: 900M
mastodon-sidekiq-pull: 1000M
mastodon-streaming: 200M

Could you rebase your PR to latest main?

Also, do you have an opinion about adding default limits as well, at least for Mastodon processes?

Something like 300M for streaming, 2000M for web, 1500M for Sidekiq? I feel like its a good idea to have some to avoid long-running memory leaks.

keskival commented 1 year ago

Also, do you have an opinion about adding default limits as well, at least for Mastodon processes?

Rebased!

I would be hesitant of adding limits, unless really high. These could cause sudden problems to people. For example my current numbers are:

tero@betanzos:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)   
mastodon-elasticsearch-coordinating-0          9m           376Mi           
mastodon-elasticsearch-coordinating-1          6m           217Mi           
mastodon-elasticsearch-coordinating-2          10m          185Mi           
mastodon-elasticsearch-data-0                  7m           1302Mi          
mastodon-elasticsearch-data-1                  12m          733Mi           
mastodon-elasticsearch-data-2                  9m           1000Mi          
mastodon-elasticsearch-ingest-0                6m           357Mi           
mastodon-elasticsearch-ingest-1                10m          244Mi           
mastodon-elasticsearch-ingest-2                15m          190Mi           
mastodon-elasticsearch-master-0                12m          223Mi           
mastodon-elasticsearch-master-1                61m          436Mi           
mastodon-elasticsearch-master-2                16m          280Mi           
mastodon-postgresql-0                          26m          1551Mi          
mastodon-redis-master-0                        26m          128Mi           
mastodon-redis-replicas-0                      17m          129Mi           
mastodon-redis-replicas-1                      16m          135Mi           
mastodon-redis-replicas-2                      14m          129Mi           
mastodon-sidekiq-all-queues-7cdbd75cdd-99mp7   545m         2487Mi          
mastodon-streaming-58f74f74c4-vwldv            1m           82Mi            
mastodon-web-948bd9cc-xxr6h                    51m          4045Mi  

The instance is rukii.net and has 6 users now. Current web pod memory use is high now probably because I'm running all sorts of scheduled tootctl scripts there now. It's also possible I have a long-running memory leak there.

renchap commented 1 year ago

We know that we have a memory leak in the ingress queue at the moment at least, this is one of the reason I am suggesting having memory limits in place by default to correctly restart pods with an abnormal memory usage.

I should be able to give you more data from mastodon.social soon, I guess if we base the limits on this instance's usage then everybody else should be fine :)

keskival commented 1 year ago

Can we add the limits in a separate PR, because I think they require a separate discussion and be liable to be closed or reverted in isolation?

keskival commented 1 year ago

I just tested setting the memory limit on web and sidekiq pods to 2048MB, seems to work so far but might cause problems for running heavy tootctl commands like refresh on those same pods. Newly started pods take much less memory than ones that had been running for weeks. That is, they are less than 512 MB now when they were many gigabytes after having run for weeks.

abbottmg commented 9 months ago

I agree that setting limits in a separate PR is worthwhile. I strongly second @renchap's opinion that limits should skew low as a counter to memory leaks within both web and sidekiq. The ingress leak was a big motivating factor in my shift from a minimal docker-compose stack to a fuller k8s deployment. I also don't think 4Gi is a reasonable memory stat for a server of 6 users, regardless of uptime. It's been rare for my ~15 user instance to top 700Mi at peak hours, even when we participated in a couple large relays as well. That points to some potential leaks IMHO, but that's neither here nor there.

To @keskival's point, I think that just goes to show there's more discussion to be had there. I also think a separate issue/PR will give scope to add HPAs to our toolbox here. With a memoryUtilization target in a sweet spot between request and limit, the autoscaler can spin up a fresh instance in time to create a rolling replacement with no downtime.

The sidekiq pods obviously lend themselves well to autoscaling, but I've also been running two web pods in parallel without issue. I know the admin at infosec.exchange needed to implement sessionAffinity because their S3 upload was really slow and multipart uploads were getting split between web pods. I haven't run into that problem, but it appears to be a minor hurdle anyway.