reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Kubegres might start with primary on read-only database #87

Open antoinetran opened 2 years ago

antoinetran commented 2 years ago

Hi,

We use Kubegres and sometime we stop postgresql clusters. The persistence is local so we have 1 primary and 1 replica and each of them has 1 pv / pvc related to 1 host directory.

When we stop postgresql clusters, we remove the pods, the pv/pvc. Then when we restart, we reapply the Yaml. But sometime the primary database goes to the replica persistence, which is an issue because then the application using postgresql tries to do a CREATE TABLE and we have cannot execute CREATE TABLE in a read-only transaction

So I have 2 workarounds on this: 1) delete the primary pods, the related pv/pvc. Then the replica will be promoted to master. Lots of commands. 2) relabelling the primary pod to replica, and vice-versa.

# If the pod to test is a replica, it return "f" without newline.
# If it is a primary, it returns "t" without newline.
# If the pod is not found, it returns an non-zero exit code.
isReplica() {
  podToTest="$1"
  container="$(echo "${podToTest}" | sed 's,-0$,,g')"
  kubectl -n "${namespace}" exec "${podToTest}" -c "${container}" -- bash -c 'PGPASSWORD="${POSTGRES_PASSWORD}" psql -U postgres -c "SELECT pg_is_in_recovery();" --tuples-only' | tr -d "\r" | tr -d "\n" | tr -d " "
}

# Prerequisites: 1 primary and N (N>0) replica.
# The primary will be changed to a replica and we loop in all replicas to find the correct primary, to label it as primary.
fixPrimaryReplica() {
  podPrimary="$1"

  # Primary database will be labelled as replica first.
  kubectl -n "${namespace}" label pods "${podPrimary}" replicationRole=replica --overwrite

  # Searching for the replica that should be primary instead...
  for pod in $(kubectl -n "${namespace}" get pods -l replicationRole=replica -o jsonpath='{.items[*].metadata.name}') ; do
    printf "Searching if ${pod} should be a primary..."
    isReplicaResult="$(isReplica "${pod}")"
    if test "t" == "${isReplicaResult}" ; then
      printf " no\n"
    elif test "f" == "${isReplicaResult}" ; then
      printf " found! Labelling it as primary\n"
      doAndCheckCmd kubectl -n "${namespace}" label pods "${pod}" replicationRole=primary --overwrite
      break
    else
      echo "[ERROR] Unexpected value: ${isReplicaResult} for pod: ${pod}"
    fi
  done
}

doAndCheckCmd() {
  "$@"
  exitCode="$?"
  if test 0 != "${exitCode}" ; then
    echo "[ERROR] Command failed with exit code ${exitCode}:" "$@" >&2
    exit "${exitCode}"
  fi
}

checkAndFixPrimaryReplica() {
  podPrimary="$(kubectl -n "${namespace}" get pods -l replicationRole=primary  -o jsonpath='{.items[0].metadata.name}')"
  if test 0 != "$?" ; then
    echo "[ERROR] pod not found." >&2
    exit 11
  fi

  if test "t" == "$(isReplica "${podPrimary}")" ; then
    echo "Case primary postgresql using replica database found. Fixing it..."
    doAndCheckCmd fixPrimaryReplica "${podPrimary}"
  else
    echo "Case primary postgresql using primary database found. Nothing to do."
  fi
}

checkAndFixPrimaryReplica

Shouldn't kubegres check on the current state of postgresql persistence? The first time, it should set the database state accordingly to the postgresql primary election. No?

Thank you.