mastodon / chart

Helm chart for Mastodon deployment in Kubernetes
GNU Affero General Public License v3.0
151 stars 90 forks source link

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

Open keskival opened 1 year ago

keskival commented 1 year ago

Steps to reproduce the problem

  1. Install Mastodon from the Helm chart to a multi-node Kubernetes cluster with an NFS storage class.
  2. If the mastodon-web and mastodon-sidekiq-all-queues end up on different nodes, some of them will hang indefinitely on "ContainerCreating".

They are waiting to mount the persistence volumes system and assets. These can only be mounted on a single node at a time.

Expected behaviour

Everything should work on roughly default settings

Actual behaviour

The pods hang in ContainerCreating state in a difficult to understand way.

Detailed description

The default settings are non-functional on multi-node clusters. Either there needs to be a better comment warning to set pod affinities, the default mode should be ReadWriteMany, or there should be a pod affinity defined which puts these two kinds of pods to the same nodes by default.

Specifications

Mastodon: edge OS: Ubuntu Kubernetes: MicroK8S Nodes: 2+

keskival commented 1 year ago

This same problem also spans to the Job mastodon-db-migrate, for which there doesn't seem to be a separate place to set nodeAffinity by values.yaml.

However, there the Helm chart includes function to set podAffinity to make it co-located with app.kubernetes.io/part-of=rails: https://github.com/mastodon/mastodon/blob/ed07f10ca8d4e65ec58958f300a8bb7c762ccbbd/chart/templates/job-db-migrate.yaml#L22-L35

Similar setting should be added to sidekiq and mastodon-web deployments as well to make them co-locate with each other if ReadWriteOnce is set.

keskival commented 1 year ago

Added an in-progress PR here: https://github.com/mastodon/chart/pull/13

WilyWildWilly commented 6 months ago

Hi, have you tried setting the persistence as ReadWriteMany? I ask because I'm setting up a single-node cluster for now but will shift to multi-node in a second moment and I'd like to avoid running into this pitfall. And I don't know if setting ReadWriteMany can work to have multiple pods with Sidekiq and Rails instances possibly not staying on the same pods like it happened to you.

keskival commented 5 months ago

ReadWriteMany works, but of course requires support for it from the storage class. Alternatively you can force the pods to co-locate, which kind of moots the point of having a multi-node cluster in the first place.