Closed krajorama closed 2 years ago
With rollout operator killed off, after another update of the config:
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
checksum/config: eb54c06d95c2e592f6c00fef442070c26c355f3178d03cbaab32c149534b0b3a
meta.helm.sh/release-name: krajo
meta.helm.sh/release-namespace: dev
rollout-max-unavailable: "10"
creationTimestamp: "2022-04-28T16:16:58Z"
generation: 4
labels:
app.kubernetes.io/component: store-gateway
app.kubernetes.io/instance: krajo
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: mimir
app.kubernetes.io/part-of: memberlist
app.kubernetes.io/version: 2.0.0
helm.sh/chart: mimir-distributed-2.0.9
rollout-group: store-gateway
zone: zone-a
name: krajo-mimir-store-gateway-zone-a
namespace: dev
resourceVersion: "2905246"
selfLink: /apis/apps/v1/namespaces/dev/statefulsets/krajo-mimir-store-gateway-zone-a
uid: 88d76d77-f8e2-4023-8775-b460b078d4a2
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: store-gateway
app.kubernetes.io/instance: krajo
app.kubernetes.io/name: mimir
rollout-group: store-gateway
zone: zone-a
serviceName: krajo-mimir-store-gateway-headless
template:
metadata:
annotations:
checksum/config: eb54c06d95c2e592f6c00fef442070c26c355f3178d03cbaab32c149534b0b3a
creationTimestamp: null
labels:
app.kubernetes.io/component: store-gateway
app.kubernetes.io/instance: krajo
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: mimir
app.kubernetes.io/part-of: memberlist
app.kubernetes.io/version: 2.0.0
helm.sh/chart: mimir-distributed-2.0.9
rollout-group: store-gateway
zone: zone-a
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: target
operator: In
values:
- store-gateway
- key: target
operator: NotIn
values:
- store-gateway-zone-a
topologyKey: kubernetes.io/hostname
containers:
- args:
- -target=store-gateway
- -config.file=/etc/mimir/mimir.yaml
- -store-gateway.sharding-ring.instance-availability-zone=zone-a
image: grafana/mimir:2.0.0
imagePullPolicy: IfNotPresent
name: store-gateway
ports:
- containerPort: 8080
name: http-metrics
protocol: TCP
- containerPort: 9095
name: grpc
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: http-metrics
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 512Mi
securityContext:
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/mimir
name: config
- mountPath: /var/mimir
name: runtime-config
- mountPath: /data
name: storage
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: krajo-mimir
serviceAccountName: krajo-mimir
terminationGracePeriodSeconds: 240
volumes:
- name: config
secret:
defaultMode: 420
secretName: krajo-mimir-config
- configMap:
defaultMode: 420
name: krajo-mimir-runtime
name: runtime-config
updateStrategy:
type: OnDelete
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: microk8s-hostpath
volumeMode: Filesystem
status:
phase: Pending
status:
availableReplicas: 1
collisionCount: 0
currentRevision: krajo-mimir-store-gateway-zone-a-6b64ccdc98
observedGeneration: 4
readyReplicas: 1
replicas: 1
updateRevision: krajo-mimir-store-gateway-zone-a-764d89475
So it turns out to be an issue of a missing "name" label in the statefulset template (not object name, but actual label) required by the operator here: https://github.com/grafana/rollout-operator/blob/main/pkg/controller/controller.go#L402
User suggestions and questions: "
"
I think we can remove the name
label requirement. See:
https://github.com/grafana/rollout-operator/issues/15
Reproduction steps:
Install mimir from https://github.com/grafana/helm-charts/pull/1205 , enable for example store-gateway zone aware replication , i.e. via custome values.yaml:
After installation, write a letter into the
mimir.config
, just to alter its checksum.Expected (works without rollout op): store-gateway Pods are restarted to take in the new configuration.
Actual: nothing happens, Pods are not restarted.
Additional info: Rollout operator prints reconciled store-gateway statefulsets messages.
Before change to config, the statefullset state is:
After the upgrade:
I've added the checksum on statefulset itself as annotation but didn't help.