timescale / helm-charts

Configuration and Documentation to run TimescaleDB in your Kubernetes cluster
Apache License 2.0
264 stars 223 forks source link

Migrate from 0.5.4 to 0.6.X with Upgrade script and nameOverride changed #176

Open Firaenix opened 4 years ago

Firaenix commented 4 years ago

Hi,

I'm really looking forward to upgrading to the 0.6 version of the timescaledb-single helm chart to get TimescaleDB 1.7.

I just finished reading through the upgrade guide and grabbed the shell script to migrate the secrets across. I have run into a problem where I have set the nameOverride field in the values.yaml to "mydbname-timescale" from "timescaledb".

Along with that, I have specified a specific namespace to place the helm chart into. The namespace is called "databases".

The current script provided doesn't seem to take these changes into account. What sort of changes need to be made to the script to successfully migrate across? Will it involve more than just changing the namespace + name of the secrets?

Cheers, Nick

feikesteenbergen commented 4 years ago

No, changing the namespace + name of the secrets should be enough

feikesteenbergen commented 4 years ago

Changing the names of the Secrets seems best to me, however you can specify the names as well here:

https://github.com/timescale/timescaledb-kubernetes/blob/master/charts/timescaledb-single/values.yaml#L26-L41

# These secrets should exist before the Helm is used to deploy this TimescaleDB.
# You can use generate_kustomization.sh to help in creating these secrets, or have
# a look at kustomize/example to see how you could install them.
secretNames:
  # This secret should contain environment variables that influence Patroni,
  # for example PATRONI_SUPERUSER_PASSWORD or PATRONI_REPLICATION_PASSWORD
  # https://patroni.readthedocs.io/en/latest/ENVIRONMENT.html#postgresql
  credentials: # defaults to RELEASE-credentials

  # This secret should be a Secret of type kubernetes.io/tls, containing
  # both a tls.key and a tls.crt
  certificate: # defaults to RELEASE-certificate

  # This secret should contain environment variables that influence pgBackRest,
  # for example, PGBACKREST_REPO1_S3_KEY or PGBACKREST_REPO1_S3_KEY_SECRET
  pgbackrest:  # defaults to RELEASE-pgbackrest
Firaenix commented 4 years ago

So I modified the script to take my nameOverride and namespace into account, is there some way I can validate that everything is all good?

Firaenix commented 4 years ago

I gave the upgrade a go, failed so I must have messed something up in the script.

image
feikesteenbergen commented 4 years ago

Hmm, can you share some details about the pod? There shouldn't be any secrets in there, but you may wish to look at the output before posting.

kubectl describe pod/<podname>

Could you please share that as plain text instead of a screenshot?

Firaenix commented 4 years ago

no probs, I rolled back to 0.5.4, will try the upgrade again and send you the output of describe

Firaenix commented 4 years ago
Name:         timescaledb-uptimesv-timescale-2
Namespace:    databases
Priority:     0
Node:         pool-tgrhprvvk-3fxch/10.130.61.118
Start Time:   Fri, 19 Jun 2020 18:06:09 +1000
Labels:       app=timescaledb-uptimesv-timescale
              cluster-name=timescaledb
              controller-revision-hash=timescaledb-uptimesv-timescale-7b96c9cf8
              release=timescaledb
              role=replica
              statefulset.kubernetes.io/pod-name=timescaledb-uptimesv-timescale-2
Annotations:  status:
                {"conn_url":"postgres://10.244.1.49:5432/postgres","api_url":"http://10.244.1.49:8008/patroni","state":"running","role":"replica","version...
Status:       Running
IP:           10.244.1.49
IPs:
  IP:           10.244.1.49
Controlled By:  StatefulSet/timescaledb-uptimesv-timescale
Init Containers:
  tstune:
    Container ID:  docker://a658b8d5690c83896adfcff6d03ebfb3149ce700559d1397e20a589662eec8a1
    Image:         timescaledev/timescaledb-ha:pg11-ts1.7
    Image ID:      docker-pullable://timescaledev/timescaledb-ha@sha256:3934896b7c8410da7e127667c7742fd7776f4f9cda51b1532c5f8866c5821cda
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      set -e
      [ $CPUS -eq 0 ]   && CPUS="${RESOURCES_CPU_LIMIT}"
      [ $MEMORY -eq 0 ] && MEMORY="${RESOURCES_MEMORY_LIMIT}"

      if [ -f "${PGDATA}/postgresql.base.conf" ] && ! grep "${INCLUDE_DIRECTIVE}" postgresql.base.conf -qxF; then
        echo "${INCLUDE_DIRECTIVE}" >> "${PGDATA}/postgresql.base.conf"
      fi

      touch "${TSTUNE_FILE}"
      timescaledb-tune -quiet -pg-version 11 -conf-path "${TSTUNE_FILE}" -cpus "${CPUS}" -memory "${MEMORY}MB" \
         -yes

      # If there is a dedicated WAL Volume, we want to set max_wal_size to 60% of that volume
      # If there isn't a dedicated WAL Volume, we set it to 20% of the data volume
      if [ "${RESOURCES_WAL_VOLUME}" = "0" ]; then
        WALMAX="${RESOURCES_DATA_VOLUME}"
        WALPERCENT=20
      else
        WALMAX="${RESOURCES_WAL_VOLUME}"
        WALPERCENT=60
      fi

      WALMAX=$(numfmt --from=auto ${WALMAX})

      # Wal segments are 16MB in size, in this way we get a "nice" number of the nearest
      # 16MB
      WALMAX=$(( $WALMAX / 100 * $WALPERCENT / 16777216 * 16 ))
      WALMIN=$(( $WALMAX / 2 ))

      echo "max_wal_size=${WALMAX}MB" >> "${TSTUNE_FILE}"
      echo "min_wal_size=${WALMIN}MB" >> "${TSTUNE_FILE}"

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 19 Jun 2020 18:07:11 +1000
      Finished:     Fri, 19 Jun 2020 18:07:11 +1000
    Ready:          True
    Restart Count:  0
    Environment:
      TSTUNE_FILE:             /var/run/postgresql/timescaledb.conf
      RESOURCES_WAL_VOLUME:    50Gi
      RESOURCES_DATA_VOLUME:   50Gi
      INCLUDE_DIRECTIVE:       include_if_exists = '/var/run/postgresql/timescaledb.conf'
      CPUS:                    0 (requests.cpu)
      MEMORY:                  0 (requests.memory)
      RESOURCES_CPU_LIMIT:     node allocatable (limits.cpu)
      RESOURCES_MEMORY_LIMIT:  node allocatable (limits.memory)
    Mounts:
      /var/run/postgresql from socket-directory (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from uptimesv-timescaledb-service-account-token-4xsxw (ro)
Containers:
  timescaledb:
    Container ID:  docker://7bae0ea4665e5ae3ed1e10fe54922591c3a7d36b3905b56d076c140bb013ae5e
    Image:         timescaledev/timescaledb-ha:pg11-ts1.7
    Image ID:      docker-pullable://timescaledev/timescaledb-ha@sha256:3934896b7c8410da7e127667c7742fd7776f4f9cda51b1532c5f8866c5821cda
    Ports:         8008/TCP, 5432/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /bin/bash
      -c

      install -o postgres -g postgres -d -m 0700 "/var/lib/postgresql/data" "/var/lib/postgresql/wal/pg_wal" || exit 1
      TABLESPACES=""
      for tablespace in ; do
        install -o postgres -g postgres -d -m 0700 "/var/lib/postgresql/tablespaces/${tablespace}/data"
      done

      # Environment variables can be read by regular users of PostgreSQL. Especially in a Kubernetes
      # context it is likely that some secrets are part of those variables.
      # To ensure we expose as little as possible to the underlying PostgreSQL instance, we have a list
      # of allowed environment variable patterns to retain.
      #
      # We need the KUBERNETES_ environment variables for the native Kubernetes support of Patroni to work.
      #
      # NB: Patroni will remove all PATRONI_.* environment variables before starting PostgreSQL

      # We store the current environment, as initscripts, callbacks, archive_commands etc. may require
      # to have the environment available to them
      set -o posix
      export -p > "${HOME}/.pod_environment"
      export -p | grep PGBACKREST > "${HOME}/.pgbackrest_environment"

      for UNKNOWNVAR in $(env | awk -F '=' '!/^(PATRONI_.*|HOME|PGDATA|PGHOST|LC_.*|LANG|PATH|KUBERNETES_SERVICE_.*)=/ {print $1}')
      do
          unset "${UNKNOWNVAR}"
      done

      touch /var/run/postgresql/timescaledb.conf

      echo "*:*:*:postgres:${PATRONI_SUPERUSER_PASSWORD}" >> ${HOME}/.pgpass
      chmod 0600 ${HOME}/.pgpass

      export PATRONI_POSTGRESQL_PGPASS="${HOME}/.pgpass.patroni"

      exec patroni /etc/timescaledb/patroni.yaml

    State:          Running
      Started:      Fri, 19 Jun 2020 18:07:14 +1000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  2500Mi
    Requests:
      cpu:      1
      memory:   2000Mi
    Readiness:  exec [pg_isready -h /var/run/postgresql] delay=5s timeout=5s period=30s #success=1 #failure=6
    Environment Variables from:
      timescaledb-credentials  Secret  Optional: false
      timescaledb-pgbackrest   Secret  Optional: true
    Environment:
      PATRONI_admin_OPTIONS:               createrole,createdb
      PATRONI_REPLICATION_USERNAME:        standby
      PATRONI_KUBERNETES_POD_IP:            (v1:status.podIP)
      PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(PATRONI_KUBERNETES_POD_IP):5432
      PATRONI_RESTAPI_CONNECT_ADDRESS:     $(PATRONI_KUBERNETES_POD_IP):8008
      PATRONI_NAME:                        timescaledb-uptimesv-timescale-2 (v1:metadata.name)
      PATRONI_POSTGRESQL_DATA_DIR:         /var/lib/postgresql/data
      PATRONI_KUBERNETES_NAMESPACE:        databases
      PATRONI_KUBERNETES_LABELS:           {app: timescaledb-uptimesv-timescale, cluster-name: timescaledb, release: timescaledb}
      PATRONI_SCOPE:                       timescaledb
      PGBACKREST_CONFIG:                   /etc/pgbackrest/pgbackrest.conf
      PGDATA:                              $(PATRONI_POSTGRESQL_DATA_DIR)
      PGHOST:                              /var/run/postgresql
    Mounts:
      /etc/certificate from certificate (ro)
      /etc/pgbackrest from pgbackrest (ro)
      /etc/timescaledb/patroni.yaml from patroni-config (ro,path="patroni.yaml")
      /etc/timescaledb/scripts from timescaledb-scripts (ro)
      /var/lib/postgresql from storage-volume (rw)
      /var/lib/postgresql/wal from wal-volume (rw)
      /var/run/postgresql from socket-directory (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from uptimesv-timescaledb-service-account-token-4xsxw (ro)
  pgbackrest:
    Container ID:  docker://4949f03478bf05a0fcc91d4c8fca05a5e266be288e8ef7219c8ffed16843352e
    Image:         timescaledev/timescaledb-ha:pg11-ts1.7
    Image ID:      docker-pullable://timescaledev/timescaledb-ha@sha256:3934896b7c8410da7e127667c7742fd7776f4f9cda51b1532c5f8866c5821cda
    Port:          8081/TCP
    Host Port:     0/TCP
    Command:
      /etc/timescaledb/scripts/pgbackrest_bootstrap.sh
    State:          Running
      Started:      Fri, 19 Jun 2020 18:07:18 +1000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      timescaledb-credentials  Secret  Optional: false
      timescaledb-pgbackrest   Secret  Optional: false
    Environment:
      PGHOST:             /var/run/postgresql
      PGBACKREST_STANZA:  poddb
      PGBACKREST_CONFIG:  /etc/pgbackrest/pgbackrest.conf
    Mounts:
      /etc/pgbackrest from pgbackrest (ro)
      /etc/timescaledb/scripts from timescaledb-scripts (ro)
      /var/lib/postgresql from storage-volume (rw)
      /var/lib/postgresql/wal from wal-volume (rw)
      /var/run/postgresql from socket-directory (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from uptimesv-timescaledb-service-account-token-4xsxw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  wal-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  wal-volume-timescaledb-uptimesv-timescale-2
    ReadOnly:   false
  storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storage-volume-timescaledb-uptimesv-timescale-2
    ReadOnly:   false
  socket-directory:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  patroni-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      timescaledb-uptimesv-timescale-patroni
    Optional:  false
  timescaledb-scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      timescaledb-uptimesv-timescale-scripts
    Optional:  false
  pgbackrest:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      timescaledb-uptimesv-timescale-pgbackrest
    Optional:  false
  certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  timescaledb-certificate
    Optional:    false
  uptimesv-timescaledb-service-account-token-4xsxw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  uptimesv-timescaledb-service-account-token-4xsxw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                From                           Message
  ----     ------                  ----               ----                           -------
  Normal   Scheduled               <unknown>          default-scheduler              Successfully assigned databases/timescaledb-uptimesv-timescale-2 to pool-tgrhprvvk-3fxch
  Normal   SuccessfulAttachVolume  89s                attachdetach-controller        AttachVolume.Attach succeeded for volume "pvc-b3730c83-0c80-424f-bd7d-1decb5ece58e"
  Warning  FailedMount             86s (x5 over 94s)  kubelet, pool-tgrhprvvk-3fxch  MountVolume.WaitForAttach failed for volume "pvc-7cdad8f3-ac00-41bf-ad3e-5e6c5ca0de66" : volume 08cadd5c-7215-11ea-9953-0a58ac14a251 has GET error for volume attachment csi-dc98f96a007c293322f68ee3a0136eba69ab659d436ae6248af77734a0cd87d9: volumeattachments.storage.k8s.io "csi-dc98f96a007c293322f68ee3a0136eba69ab659d436ae6248af77734a0cd87d9" is forbidden: User "system:node:pool-tgrhprvvk-3fxch" cannot get resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: no relationship found between node "pool-tgrhprvvk-3fxch" and this object
  Warning  FailedMount             78s                kubelet, pool-tgrhprvvk-3fxch  MountVolume.WaitForAttach failed for volume "pvc-7cdad8f3-ac00-41bf-ad3e-5e6c5ca0de66" : watch error:unknown (get volumeattachments.storage.k8s.io) for volume 08cadd5c-7215-11ea-9953-0a58ac14a251
  Normal   SuccessfulAttachVolume  67s                attachdetach-controller        AttachVolume.Attach succeeded for volume "pvc-7cdad8f3-ac00-41bf-ad3e-5e6c5ca0de66"
  Normal   Pulled                  50s                kubelet, pool-tgrhprvvk-3fxch  Container image "timescaledev/timescaledb-ha:pg11-ts1.7" already present on machine
  Normal   Created                 49s                kubelet, pool-tgrhprvvk-3fxch  Created container tstune
  Normal   Started                 49s                kubelet, pool-tgrhprvvk-3fxch  Started container tstune
  Normal   Pulling                 49s                kubelet, pool-tgrhprvvk-3fxch  Pulling image "timescaledev/timescaledb-ha:pg11-ts1.7"
  Normal   Pulled                  46s                kubelet, pool-tgrhprvvk-3fxch  Successfully pulled image "timescaledev/timescaledb-ha:pg11-ts1.7"
  Normal   Created                 46s                kubelet, pool-tgrhprvvk-3fxch  Created container timescaledb
  Normal   Started                 46s                kubelet, pool-tgrhprvvk-3fxch  Started container timescaledb
  Normal   Pulling                 46s                kubelet, pool-tgrhprvvk-3fxch  Pulling image "timescaledev/timescaledb-ha:pg11-ts1.7"
  Normal   Pulled                  43s                kubelet, pool-tgrhprvvk-3fxch  Successfully pulled image "timescaledev/timescaledb-ha:pg11-ts1.7"
  Normal   Created                 43s                kubelet, pool-tgrhprvvk-3fxch  Created container pgbackrest
  Normal   Started                 42s                kubelet, pool-tgrhprvvk-3fxch  Started container pgbackrest
Firaenix commented 4 years ago

I reckon I must have mucked up the script to the point where secrets were copied across incorrectly along with my pgbackrest settings, not sure about my tls stuff but I guess it would be safe to assume they were broken too

Firaenix commented 4 years ago

Any updates or insights?

feikesteenbergen commented 4 years ago

What is the current state? Some (synchronous) troubleshooting may be useful, you can use https://slack.timescale.com/, where I'm normally available during European weekdays.

Firaenix commented 4 years ago

So its still not working, I'm just sitting on 0.5.4 at the moment, I need to identify which secrets I can delete so that it wont mess up my current configuration.

I will jump on the slack at some point, but I'm in the Australian Sydney timezone, so seems like it'll be asynchronous debugging for me. I could probably set aside some time after one or two of my workdays to debug with you.

Firaenix commented 4 years ago

We ended up needing to swap infrastructure providers anyway, so we provisioned the 0.6 version of the helm chart on our new cluster - everything running smoothly.

Although it doesn't fix the issue, I am no longer affected, I appreciate the help in any case.