Open dimitarvdimitrov opened 4 months ago
@beatkind tagging you because I don't think I can assign you without you participating in the issue. Can you assign yourself by any chance? From the github docs
You can assign multiple people to each issue or pull request, including yourself, anyone who has commented on the issue or pull request, anyone with write permissions to the repository, and organization members with read permissions to the repository. For more information, see "Access permissions on GitHub."
@dimitarvdimitrov And I need to actively write here :) to be participating - nope I am not able to assign myself, because I do not have any permissions inside the repo
@beatkind I saw you marked this as fixed by https://github.com/grafana/mimir/pull/7431. I'd like to keep this issue open until we also document the migration procedure. Migration is now technically possible, but it's not a very user-friendly process because users needs to figure out the steps themselves.
@dimitarvdimitrov thanks for reopening, this was simply a mistake :) - I will add some documentation with my next PR
https://github.com/grafana/mimir/pull/7282 added autoscaling to Helm as an experimental feature. This issue is about adding support in the helm chart for a smooth migration and adding documentation for the migration.
Why do we need a migration?
Migrating to a Mimir cluster with autoscaling requires a few intermediate steps to ensure that there are no disruptions to traffic. The major risk is that enabling autoscaling also removed the
replicas
field from Deployments. If KEDA/HPA hasn't started autoscaling the Deployment, then k8s interprets no replicas as meaning1
replica, which can cause an outage.Migration in a nutshel
distributor.kedaAutoscaling.preserveReplicas: true
field in the helm chart which doesn't delete the replicas field from the rendered manifests (https://github.com/grafana/mimir/pull/7431)preserveReplicas: true
, deploy the chart.preserveReplicas
fromvalues.yaml
and deploy the chartInternal docs
I'm also pasing Grafana Labs-internal documentation that's specific to our deployment tooling with FluxCD. Perhaps it can be used by folks running FluxCD or as a starting point for proper docs:
remove_managed_replicas.sh
```bash #!/usr/bin/env bash set -euo pipefail help() { echo "Usage: ./remove_managed_replicas.sh [ -c | --context ] [ -n | --namespace ] [ -o | --object ] [ -d | --dry-run ] [ -h | --help ] Outputs a diff of changes made to the object. " exit 2 } VALID_ARGUMENTS=$# if [ "$VALID_ARGUMENTS" -eq 0 ]; then help fi CONTEXT="" NAMESPACE="" OBJECT="" DRY_RUN=false DRY_RUN_ARG="" while [ "$#" -gt 0 ] do case "$1" in -c | --context ) CONTEXT="--context=${2}" shift 2 ;; -n | --namespace ) NAMESPACE="$2" shift 2 ;; -o | --object ) OBJECT="$2" shift 2 ;; -d | --dry-run ) DRY_RUN=true DRY_RUN_ARG="--dry-run=server" shift 1 ;; -h | --help) help ;; --) shift; break ;; *) echo "Unexpected option: ${1}" help ;; esac done if [ -z "${NAMESPACE}" ] then echo "Must supply a namespace." exit 1 fi if [ -z "${OBJECT}" ] then echo "Must supply a kubernetes object (such as \`-o Deployment/example\`)." exit 1 fi KC="kubectl ${CONTEXT} --namespace=${NAMESPACE}" BEFORE=$(${KC} get "${OBJECT}" -o yaml --show-managed-fields=true) BEFORE_JSON=$(${KC} get "${OBJECT}" -o json --show-managed-fields=true) INDEX=$(echo "${BEFORE_JSON}" | jq '.metadata.managedFields | map(.manager == "kustomize-controller") | index(true)') # Check we can find the position of flux's entry in managedFields: if ! [[ $INDEX =~ ^[0-9]+$ ]] then echo "Unable to find \`kustomize-controller\` (flux) in the managedFields metadata for ${OBJECT}." echo "Has flux not ran on this object before?" echo "This may happen if you have deployed the object manually." echo "It should be safe to continue removing the flux-ignore as the object was never managed by flux." exit 1 fi # Check that `.spec.replicas` is set in the managedFields entry CHECK=$(echo "${BEFORE_JSON}" | jq ".metadata.managedFields[${INDEX}].fieldsV1.\"f:spec\".\"f:replicas\"") if [ "${CHECK}" = "null" ] then echo "Unable to find \`.spec.replicas\` set in the managedFields metadata for \`kustomize-controller\`." echo "Has the field already been unset?" echo "This may happen if the HPA has already scaled the object." echo "It is safe to continue removing the flux-ignore on the object." exit 1 fi AFTER=$(${KC} patch "${OBJECT}" -o yaml --show-managed-fields=true ${DRY_RUN_ARG} --type='json' -p "[{'op': 'remove', 'path': '/metadata/managedFields/${INDEX}/fieldsV1/f:spec/f:replicas'}]") diff -u <(echo "${BEFORE}") <(echo "${AFTER}") || : if [ "${DRY_RUN}" = true ] then echo "" echo "Dry run only. No changes have been applied." fi exit 0 ```