lwolf / stolon-chart

Kubernetes Helm chart to deploy HA Postgresql cluster based on Stolon
MIT License
105 stars 39 forks source link

Add support for kubernetes backend #18

Closed Flowkap closed 6 years ago

Flowkap commented 6 years ago

It'll be nice to support kubernetes backend by using configmaps as well. See https://github.com/sorintlab/stolon/commit/38ae6b13b5e161a5bfe0fbe01084ca060eaf2e76

I'm pretty new to charts and K8 in general, but if you'd like I can try it out?

lwolf commented 6 years ago

That's great idea, I somehow missed that feature release. Sure, please try, feel free to ask any implementation questions here and I'll try to help

Flowkap commented 6 years ago

Yeah cool. Might take some days as it's the first helm chart I'm actively working on :)

Flowkap commented 6 years ago

I got it almost to work, but my keepers are not registering on the sentinel. See a demo commit here:

https://github.com/Flowkap/stolon-chart/commit/062ba7c6d430085f3c0383a7bb444ff194e8924a

As i'm new to the stolon stuf I don't know why they'Re not connecting to be honest. Do not find much on that error as well, except uninitialized backend( bt that works and annotiations seem just fine)

Keeper

2018-03-05T14:13:49.991Z    INFO    cmd/keeper.go:964   our keeper data is not available, waiting for it to appear

Sentinel

2018-03-05T14:19:11.031Z    INFO    cmd/sentinel.go:778 trying to find initial master
2018-03-05T14:19:11.031Z    ERROR   cmd/sentinel.go:1815    failed to update cluster data   {"error": "cannot choose initial master: no keepers registered"}

Proxy

2018-03-05T14:19:20.672Z    INFO    cmd/proxy.go:234    no db object available, closing connections to master   {"db": ""}

Update:

I also checked the actual configs of all resources side by side (the current example for kubernetes and the ones created by the chart). Beside the additional labels I can't see a difference :/

Still no keeper gets active and hence no sentinel founds keepers.

lwolf commented 6 years ago

I need to try to run it by myself, but errors from your sentinel seems like the cluster initialisation job did not complete or failed. Did you check it? https://github.com/lwolf/stolon-chart/blob/master/stolon/templates/cluster-create-job.yaml

Also, after second look at your code, it seems that you did not change init-container in that job, and it still trying to check etcd availability.

Flowkap commented 6 years ago

Thought to have changed everything there accordingly to the example. You're probably right. The created config map looked good though.

Thx for the tip. I'll double check the init again.

Flowkap commented 6 years ago

Did you have time to try it out? As far as I can see I changed the init accordingly. Even same result if I do it manually according to the official example.

https://github.com/Flowkap/stolon-chart/blob/062ba7c6d430085f3c0383a7bb444ff194e8924a/stolon/templates/cluster-create-job.yaml

lwolf commented 6 years ago

Hi, I did not. I was waiting for you to come back whether I pointed out to the issue correctly or not. Sorry about that.

Correct me if I'm wrong, but your cluster-create-job is not working because of this:

 "command": ["/bin/sh", "-c", "while ! etcdctl --endpoints {{ .Values.store.endpoints }} cluster-health; do sleep 1 && echo -n .; done"],
{{- if eq .Values.store.backend "kubernetes" }}
 - --kube-resource-kind={{ .Values.store.kubeRessourceKind }}
{{- else }}
- --store-endpoints={{ .Values.store.endpoints }}
{{- end }}

corresponding links in the source file: https://github.com/Flowkap/stolon-chart/blob/062ba7c6d430085f3c0383a7bb444ff194e8924a/stolon/templates/cluster-create-job.yaml#L25

https://github.com/Flowkap/stolon-chart/blob/062ba7c6d430085f3c0383a7bb444ff194e8924a/stolon/templates/cluster-create-job.yaml#L41-L44

The point here is that you're trying to use .Values.store.endpoints in init-container, but your kubernetes provider implementation does not require/provide this variable.

Flowkap commented 6 years ago

Both parts you mentioned are conditional for when Backend isn't kubernetes. The command executes and created a config map just fine. The DB kick not into action anyway. I compared the config map by my chart with the original one created by the stolon example and there's no difference I can see.

I'd really appreciate of you could try and install my chart and check.

Flowkap commented 6 years ago

Today I tried it again.

All deployments and pods start fine. The configmap is generated:

{
  "kind": "ConfigMap",
  "apiVersion": "v1",
  "metadata": {
    "name": "stolon-cluster-queenly-meerkat-stolon",
    "namespace": "default",
    "selfLink": "/api/v1/namespaces/default/configmaps/stolon-cluster-queenly-meerkat-stolon",
    "uid": "af754648-31b7-11e8-b19c-0800271047e3",
    "resourceVersion": "8070",
    "creationTimestamp": "2018-03-27T12:09:22Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"6783d587\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2018-03-27T12:09:22Z\",\"renewTime\":\"2018-03-27T12:29:05Z\",\"leaderTransitions\":0}",
      "stolon-clusterdata": "{\"FormatVersion\":1,\"Cluster\":{\"uid\":\"63373f45\",\"generation\":1,\"changeTime\":\"2018-03-27T12:09:27.279492341Z\",\"spec\":{\"additionalWalSenders\":null,\"additionalMasterReplicationSlots\":null,\"initMode\":\"new\",\"pgHBA\":null},\"status\":{\"phase\":\"initializing\"}},\"Keepers\":{},\"DBs\":{},\"Proxy\":{\"changeTime\":\"0001-01-01T00:00:00Z\",\"spec\":{},\"status\":{}}}"
    }
  }
}

This one is the configmap which gets created by the original example in https://github.com/sorintlab/stolon/tree/master/examples/kubernetes

{
  "kind": "ConfigMap",
  "apiVersion": "v1",
  "metadata": {
    "name": "stolon-cluster-kube-stolon",
    "namespace": "default",
    "selfLink": "/api/v1/namespaces/default/configmaps/stolon-cluster-kube-stolon",
    "uid": "1b30b4f4-31ba-11e8-b19c-0800271047e3",
    "resourceVersion": "7849",
    "creationTimestamp": "2018-03-27T12:26:42Z",
    "annotations": {
      "stolon-clusterdata": "{\"FormatVersion\":1,\"Cluster\":{\"uid\":\"0aa3338d\",\"generation\":1,\"changeTime\":\"2018-03-27T12:26:42.448649577Z\",\"spec\":{\"additionalWalSenders\":null,\"additionalMasterReplicationSlots\":null,\"initMode\":\"new\",\"pgHBA\":null},\"status\":{\"phase\":\"initializing\"}},\"Keepers\":{},\"DBs\":{},\"Proxy\":{\"changeTime\":\"0001-01-01T00:00:00Z\",\"spec\":{},\"status\":{}}}"
    }
  }
}

The services find all pods but the keepers don't initialize with the following recurring log:

2018-03-27T12:28:47.402Z    INFO    cmd/keeper.go:964   our keeper data is not available, waiting for it to appear

And the proxies:

2018-03-27T12:29:58.279Z    INFO    cmd/proxy.go:234    no db object available, closing connections to master   {"db": ""}

And sentinels:

2018-03-27T12:31:59.022Z    ERROR   cmd/sentinel.go:1815    failed to update cluster data   {"error": "cannot choose initial master: no keepers registered"}
2018-03-27T12:32:04.044Z    INFO    cmd/sentinel.go:778 trying to find initial master

As if they don't recognize the configmap.

To be sure I also deleted all occurences in the chart concerning etcd. These are definitely not active.

Also the generated config for the keepers look identical (different run therefore the helm release is different. Of course they match in the same run)

Generated keeper from the official kubernetes example:

{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "stolon-keeper-0",
    "generateName": "stolon-keeper-",
    "namespace": "default",
    "selfLink": "/api/v1/namespaces/default/pods/stolon-keeper-0",
    "uid": "4a42dfb5-31c1-11e8-b19c-0800271047e3",
    "resourceVersion": "13370",
    "creationTimestamp": "2018-03-27T13:18:07Z",
    "labels": {
      "app": "stolon-keeper",
      "controller-revision-hash": "stolon-keeper-6c86f6447c",
      "statefulset.kubernetes.io/pod-name": "stolon-keeper-0",
      "stolon-cluster": "kube-stolon"
    },
    "annotations": {
      "pod.alpha.kubernetes.io/initialized": "true",
      "stolon-status": "{\"infoUID\":\"2d6db0b8\",\"uid\":\"keeper0\",\"clusterUID\":\"579f5f63\",\"bootUUID\":\"f1e8b557-a33d-41ec-9337-58ae39832748\",\"postgresState\":{\"uid\":\"56767d5d\",\"generation\":3,\"listenAddress\":\"172.17.0.5\",\"port\":\"5432\",\"healthy\":true,\"systemID\":\"6537613245307641887\",\"timelineID\":1,\"xLogPos\":50331968,\"pgParameters\":{\"datestyle\":\"iso, mdy\",\"default_text_search_config\":\"pg_catalog.english\",\"dynamic_shared_memory_type\":\"posix\",\"lc_messages\":\"en_US.utf8\",\"lc_monetary\":\"en_US.utf8\",\"lc_numeric\":\"en_US.utf8\",\"lc_time\":\"en_US.utf8\",\"log_timezone\":\"UTC\",\"max_connections\":\"100\",\"shared_buffers\":\"128MB\",\"timezone\":\"UTC\",\"wal_level\":\"replica\"},\"synchronousStandbys\":[],\"olderWalFile\":\"000000010000000000000001\"}}"
    },
    "ownerReferences": [
      {
        "apiVersion": "apps/v1beta1",
        "kind": "StatefulSet",
        "name": "stolon-keeper",
        "uid": "4a3fe74a-31c1-11e8-b19c-0800271047e3",
        "controller": true,
        "blockOwnerDeletion": true
      }
    ]
  },
  "spec": {
    "volumes": [
      {
        "name": "data",
        "persistentVolumeClaim": {
          "claimName": "data-stolon-keeper-0"
        }
      },
      {
        "name": "stolon",
        "secret": {
          "secretName": "stolon",
          "defaultMode": 420
        }
      },
      {
        "name": "default-token-vc9hk",
        "secret": {
          "secretName": "default-token-vc9hk",
          "defaultMode": 420
        }
      }
    ],
    "containers": [
      {
        "name": "stolon-keeper",
        "image": "sorintlab/stolon:master-pg9.6",
        "command": [
          "/bin/bash",
          "-ec",
          "# Generate our keeper uid using the pod index\nIFS='-' read -ra ADDR <<< \"$(hostname)\"\nexport STKEEPER_UID=\"keeper${ADDR[-1]}\"\nexport POD_IP=$(hostname -i)\nexport STKEEPER_PG_LISTEN_ADDRESS=$POD_IP\nexport STOLON_DATA=/stolon-data\nchown stolon:stolon $STOLON_DATA\nexec gosu stolon stolon-keeper --data-dir $STOLON_DATA\n"
        ],
        "ports": [
          {
            "containerPort": 5432,
            "protocol": "TCP"
          }
        ],
        "env": [
          {
            "name": "POD_NAME",
            "valueFrom": {
              "fieldRef": {
                "apiVersion": "v1",
                "fieldPath": "metadata.name"
              }
            }
          },
          {
            "name": "STKEEPER_CLUSTER_NAME",
            "value": "kube-stolon"
          },
          {
            "name": "STKEEPER_STORE_BACKEND",
            "value": "kubernetes"
          },
          {
            "name": "STKEEPER_KUBE_RESOURCE_KIND",
            "value": "configmap"
          },
          {
            "name": "STKEEPER_PG_REPL_USERNAME",
            "value": "repluser"
          },
          {
            "name": "STKEEPER_PG_REPL_PASSWORD",
            "value": "replpassword"
          },
          {
            "name": "STKEEPER_PG_SU_USERNAME",
            "value": "stolon"
          },
          {
            "name": "STKEEPER_PG_SU_PASSWORDFILE",
            "value": "/etc/secrets/stolon/password"
          }
        ],
        "resources": {},
        "volumeMounts": [
          {
            "name": "data",
            "mountPath": "/stolon-data"
          },
          {
            "name": "stolon",
            "mountPath": "/etc/secrets/stolon"
          },
          {
            "name": "default-token-vc9hk",
            "readOnly": true,
            "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "terminationMessagePolicy": "File",
        "imagePullPolicy": "IfNotPresent"
      }
    ],
    "restartPolicy": "Always",
    "terminationGracePeriodSeconds": 10,
    "dnsPolicy": "ClusterFirst",
    "serviceAccountName": "default",
    "serviceAccount": "default",
    "nodeName": "vagrant",
    "securityContext": {},
    "hostname": "stolon-keeper-0",
    "subdomain": "stolon-keeper",
    "schedulerName": "default-scheduler"
  },
  "status": {
    "phase": "Running",
    "conditions": [
      {
        "type": "Initialized",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-03-27T13:18:07Z"
      },
      {
        "type": "Ready",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-03-27T13:18:09Z"
      },
      {
        "type": "PodScheduled",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-03-27T13:18:07Z"
      }
    ],
    "hostIP": "10.0.2.15",
    "podIP": "172.17.0.5",
    "startTime": "2018-03-27T13:18:07Z",
    "containerStatuses": [
      {
        "name": "stolon-keeper",
        "state": {
          "running": {
            "startedAt": "2018-03-27T13:18:09Z"
          }
        },
        "lastState": {},
        "ready": true,
        "restartCount": 0,
        "image": "sorintlab/stolon:master-pg9.6",
        "imageID": "docker-pullable://sorintlab/stolon@sha256:beaf9a41baaa333564cdca7b6f10ca52f40ae84dea8f11aaf37af703b1d75dda",
        "containerID": "docker://9d093d63916f24219438571de5b0a6a5bf33f316494c851f300e58a050863a79"
      }
    ],
    "qosClass": "BestEffort"
  }
}

And the one by my chart proposal:

{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "test-stolon-keeper-0",
    "generateName": "test-stolon-keeper-",
    "namespace": "default",
    "selfLink": "/api/v1/namespaces/default/pods/test-stolon-keeper-0",
    "uid": "29eb303b-31c2-11e8-b19c-0800271047e3",
    "resourceVersion": "13367",
    "creationTimestamp": "2018-03-27T13:24:23Z",
    "labels": {
      "app": "test-stolon-keeper",
      "chart": "stolon-0.5.0",
      "component": "keeper",
      "controller-revision-hash": "test-stolon-keeper-6957ff9c9b",
      "heritage": "Tiller",
      "release": "test",
      "statefulset.kubernetes.io/pod-name": "test-stolon-keeper-0",
      "stolon-cluster": "test-stolon"
    },
    "annotations": {
      "pod.alpha.kubernetes.io/initialized": "true",
      "stolon-status": "{\"infoUID\":\"d72be3e5\",\"uid\":\"keeper0\",\"clusterUID\":\"383b5c96\",\"bootUUID\":\"7dde52c2-ae54-4d1b-8de5-2389f15a948d\",\"postgresState\":{\"listenAddress\":\"172.17.0.11\",\"port\":\"5432\",\"synchronousStandbys\":null}}"
    },
    "ownerReferences": [
      {
        "apiVersion": "apps/v1beta1",
        "kind": "StatefulSet",
        "name": "test-stolon-keeper",
        "uid": "29e44275-31c2-11e8-b19c-0800271047e3",
        "controller": true,
        "blockOwnerDeletion": true
      }
    ]
  },
  "spec": {
    "volumes": [
      {
        "name": "stolon-data",
        "persistentVolumeClaim": {
          "claimName": "stolon-data-test-stolon-keeper-0"
        }
      },
      {
        "name": "stolon-secrets",
        "secret": {
          "secretName": "test-stolon",
          "defaultMode": 420
        }
      },
      {
        "name": "default-token-vc9hk",
        "secret": {
          "secretName": "default-token-vc9hk",
          "defaultMode": 420
        }
      }
    ],
    "containers": [
      {
        "name": "test-stolon-keeper",
        "image": "sorintlab/stolon:master-pg9.6",
        "command": [
          "/bin/bash",
          "-ec",
          "# Generate our keeper uid using the pod index\nIFS='-' read -ra ADDR <<< \"$(hostname)\"\nexport STKEEPER_UID=\"keeper${ADDR[-1]}\"\nexport POD_IP=$(hostname -i)\nexport STKEEPER_PG_LISTEN_ADDRESS=$POD_IP\nexport STOLON_DATA=/stolon-data\nchown stolon:stolon $STOLON_DATA\nexec gosu stolon stolon-keeper --data-dir $STOLON_DATA\n"
        ],
        "ports": [
          {
            "containerPort": 5432,
            "protocol": "TCP"
          }
        ],
        "env": [
          {
            "name": "POD_NAME",
            "valueFrom": {
              "fieldRef": {
                "apiVersion": "v1",
                "fieldPath": "metadata.name"
              }
            }
          },
          {
            "name": "STKEEPER_CLUSTER_NAME",
            "value": "test-stolon"
          },
          {
            "name": "STKEEPER_STORE_BACKEND",
            "value": "kubernetes"
          },
          {
            "name": "STKEEPER_KUBE_RESOURCE_KIND",
            "value": "configmap"
          },
          {
            "name": "STKEEPER_PG_REPL_USERNAME",
            "value": "repluser"
          },
          {
            "name": "STKEEPER_PG_REPL_PASSWORDFILE",
            "value": "/etc/secrets/stolon/pg_repl_password"
          },
          {
            "name": "STKEEPER_PG_SU_USERNAME",
            "value": "stolon"
          },
          {
            "name": "STKEEPER_PG_SU_PASSWORDFILE",
            "value": "/etc/secrets/stolon/pg_su_password"
          },
          {
            "name": "STKEPPER_DEBUG",
            "value": "false"
          }
        ],
        "resources": {
          "requests": {
            "cpu": "100m",
            "memory": "512Mi"
          }
        },
        "volumeMounts": [
          {
            "name": "stolon-data",
            "mountPath": "/stolon-data"
          },
          {
            "name": "stolon-secrets",
            "mountPath": "/etc/secrets/stolon"
          },
          {
            "name": "default-token-vc9hk",
            "readOnly": true,
            "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "terminationMessagePolicy": "File",
        "imagePullPolicy": "IfNotPresent"
      }
    ],
    "restartPolicy": "Always",
    "terminationGracePeriodSeconds": 10,
    "dnsPolicy": "ClusterFirst",
    "serviceAccountName": "default",
    "serviceAccount": "default",
    "nodeName": "vagrant",
    "securityContext": {},
    "hostname": "test-stolon-keeper-0",
    "subdomain": "test-stolon-keeper",
    "schedulerName": "default-scheduler"
  },
  "status": {
    "phase": "Running",
    "conditions": [
      {
        "type": "Initialized",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-03-27T13:24:23Z"
      },
      {
        "type": "Ready",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-03-27T13:24:25Z"
      },
      {
        "type": "PodScheduled",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-03-27T13:24:23Z"
      }
    ],
    "hostIP": "10.0.2.15",
    "podIP": "172.17.0.11",
    "startTime": "2018-03-27T13:24:23Z",
    "containerStatuses": [
      {
        "name": "test-stolon-keeper",
        "state": {
          "running": {
            "startedAt": "2018-03-27T13:24:25Z"
          }
        },
        "lastState": {},
        "ready": true,
        "restartCount": 0,
        "image": "sorintlab/stolon:master-pg9.6",
        "imageID": "docker-pullable://sorintlab/stolon@sha256:beaf9a41baaa333564cdca7b6f10ca52f40ae84dea8f11aaf37af703b1d75dda",
        "containerID": "docker://e1767b05a4d4b602071ffa2fc274a75008012118431e3a1d476d73c4ebd3ff73"
      }
    ],
    "qosClass": "Burstable"
  }
}
lwolf commented 6 years ago

Trying to run it now. Job keeps failing with permissions issue. Did you create any RBAC roles to make it install ?

$ kubectl logs -f stolon-test-stolon-pd42j
cannot get cluster data: failed to get latest version of configmap: configmaps "stolon-cluster-stolon-test-stolon" is forbidden: User "system:serviceaccount:default:default" cannot get configmaps in the namespace "default"
lwolf commented 6 years ago

got it to work. applied rbac manifests from the stolon repo with additional permission for configmaps. We should add it to the chart.

I'm seeing the same behaviour as you are. Based on the logs from the proxy it seems that initialisation does not create db for some reason. Will try to spend more time on it during weekend.

Flowkap commented 6 years ago

I got it fixed. Finally...

Long story short: The app labels MUST be exactly "stolon-keeper" "stolon-proxy" and "stolon-sentinel" ... https://github.com/sorintlab/stolon/pull/433/commits/38ae6b13b5e161a5bfe0fbe01084ca060eaf2e76#diff-95cdd374e9440fde010ff35d65f8cf3fR54

I'll make a proper commit later today and let you know when its finished. Multiple cluster deploys worked for me at the same time this way. As I'm not sure if other variants (using etcd) need the fullname on the app labels I'll make these changes conditional as well. Also added stolon-cluster labels in some places to maintain the uniqeness of service discovery in the K( cluster itself.

Come back to you when it's done :)

lwolf commented 6 years ago

Congratulations, great job!

Flowkap commented 6 years ago

got it to work. applied rbac manifests from the stolon repo with additional permission for configmaps. We should add it to the chart.

I've got the same problem as you with that. Can you gimme a link to the solution? I'll apply it as well (setup works on a minikube cluster already)

https://github.com/Flowkap/stolon-chart/commit/33f4ff3d69c6f5edf43a3437d6e5731514947910

Probably it's easier if I do the PR and you add the configmap permission.

lwolf commented 6 years ago

For RBAC you need to add configurations similar to those: https://github.com/sorintlab/stolon/blob/master/examples/kubernetes/role-binding.yaml https://github.com/sorintlab/stolon/blob/master/examples/kubernetes/role.yaml

with only difference - you need to add configmaps here: https://github.com/sorintlab/stolon/blob/master/examples/kubernetes/role.yaml#L19

But I can add it later, after PR will get merged (I'll review and test in during weekend)