rqlite / helm-charts

Helm charts for rqlite
MIT License
9 stars 0 forks source link

NodeAffinity - require x86 architecture for scheduling #12

Closed NerdyShawn closed 6 months ago

NerdyShawn commented 6 months ago

Issue Description

Currently the rqlite image is built to run on x86 cpu architecture. When attempting to deploy this chart to clusters with more than x86 nodes results in crashLooping if scheduled on a node that doesn't match the architecture.

# logs of the binary crashing
k logs -n rqlite-dev rqlite-0 
exec /bin/sh: exec format error

# the continued crashLoopBackOff that will occur for ARM users
k get pods -n rqlite-dev -o wide
NAME       READY   STATUS             RESTARTS           AGE     IP            NODE       NOMINATED NODE   READINESS GATES
rqlite-0   0/1     CrashLoopBackOff   1009 (2m35s ago)   3d13h   10.42.7.211   pinode08   <none>           <none>

# cpu architecture of the node its scheduled on
k describe nodes pinode08 | grep -i kubernetes.io/arch
Labels:             beta.kubernetes.io/arch=arm64
                    kubernetes.io/arch=arm64

Proposed workaround, node affinity

It seems like it might be more helpful to just have the pod stay in a Pending state rather than go through pulling down the image and go through what we know is going to continually fail if no x86 nodes are available in the cluster the chart is installed to. It might make sense to code in for now a node affinity hard requirement to only be scheduled on x86 nodes in cluster. This would help avoid spending cluster bandwidth on pulling the image as well as cycles of trying to bring up the pod that will fail continuously.

# currently I do something like a hard rule on the rqlite statefulset to specifically only schedule on x86 nodes
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/arch"
                  operator: In
                  values: 
                    - amd64

Proposed longer term solution

Rqlite can run on ARM v8 (and iirc maybe v7) no problem. I have validated this with using docker buildx for multiple platforms via an image I build on one of my private registries currently to deploy to things like raspberry pi hardware nodes in clusters. There is an open issue for https://github.com/rqlite/rqlite/issues/1077 to build the public rqlite image on dockerhub for multiple cpu architectures. That would negate the need for this proposed workaround for clusters with more than x86 nodes.

This would have the benefit of users being able to leverage the single helmchart manifest applied to all clusters with a heterogeneous mix of cpu architectures. Said another way users can then deploy to things like lighter-weight edge on-prem clusters and cloud clusters(like civo provider) which in my opinion rqlite shines given how well it works on more resource constraint environments.

jtackaberry commented 6 months ago

I have a Pi node in my home K8s cluster too, but I taint it for this exact reason, and then add tolerations (and sometimes node selectors) to the workloads I want to let spill over. (Even for the day job, where we have a mix of AMD and Graviton node groups in EKS, we do this.)

I agree a proper multi-arch image is definitely the way, although even then it's not certain that would solve the problem, because there's no guarantee that data written by one architecture will be readable by the other. Depends on whether both sqlite and raft data files are portable across architectures. (I strongly suspect sqlite is, but I don't know about raft.)

Until that lands, your suggestion sounds reasonable. I wouldn't want to hardcode that in the StatefulSet though, because it would prevent users from bringing their own multi-arch or arm64 builds (and setting image.repository and image.tag accordingly). But making that the default chart value for the affinity field might make sense, and it would allow users who have their own custom builds to still deploy on arm64 by overriding the chart-default affinity.

jtackaberry commented 6 months ago

Depends on whether both sqlite and raft data files are portable across architectures. (I strongly suspect sqlite is, but I don't know about raft.)

It's tangential, but as I'm curious, @otoolep do you happen to know about Raft? (Sqlite is.)

otoolep commented 6 months ago

It's tangential, but as I'm curious, @otoolep do you happen to know about Raft? (Sqlite is.)

All data in the Raft log is stored using Protobufs. Instances of Protobufs are marshaled to bytes using the official Go Protobuf libraries. From https://en.wikipedia.org/wiki/Protocol_Buffer they are cross-platform.

So sounds like it will work?

I would very much like to publish mutli-arch Docker images, but haven't got around to it. I would need to learn how to do it, and that takes time.

otoolep commented 6 months ago

And since the Raft log entries are what is sent between nodes (in Protobuf format) sounds like nodes running on different architectures could interoperate.

jtackaberry commented 6 months ago

So sounds like it will work?

Sounds promising, yep. Thanks! Looking forward to seeing multi-arch images at some point in future.

@NerdyShawn just released chart version 1.3.1 which sets the default affinity rule. Thanks for the report.