Closed Flowkap closed 6 years ago
That's great idea, I somehow missed that feature release. Sure, please try, feel free to ask any implementation questions here and I'll try to help
Yeah cool. Might take some days as it's the first helm chart I'm actively working on :)
I got it almost to work, but my keepers are not registering on the sentinel. See a demo commit here:
https://github.com/Flowkap/stolon-chart/commit/062ba7c6d430085f3c0383a7bb444ff194e8924a
As i'm new to the stolon stuf I don't know why they'Re not connecting to be honest. Do not find much on that error as well, except uninitialized backend( bt that works and annotiations seem just fine)
Keeper
2018-03-05T14:13:49.991Z INFO cmd/keeper.go:964 our keeper data is not available, waiting for it to appear
Sentinel
2018-03-05T14:19:11.031Z INFO cmd/sentinel.go:778 trying to find initial master
2018-03-05T14:19:11.031Z ERROR cmd/sentinel.go:1815 failed to update cluster data {"error": "cannot choose initial master: no keepers registered"}
Proxy
2018-03-05T14:19:20.672Z INFO cmd/proxy.go:234 no db object available, closing connections to master {"db": ""}
Update:
I also checked the actual configs of all resources side by side (the current example for kubernetes and the ones created by the chart). Beside the additional labels I can't see a difference :/
Still no keeper gets active and hence no sentinel founds keepers.
I need to try to run it by myself, but errors from your sentinel seems like the cluster initialisation job did not complete or failed. Did you check it? https://github.com/lwolf/stolon-chart/blob/master/stolon/templates/cluster-create-job.yaml
Also, after second look at your code, it seems that you did not change init-container in that job, and it still trying to check etcd availability.
Thought to have changed everything there accordingly to the example. You're probably right. The created config map looked good though.
Thx for the tip. I'll double check the init again.
Did you have time to try it out? As far as I can see I changed the init accordingly. Even same result if I do it manually according to the official example.
Hi, I did not. I was waiting for you to come back whether I pointed out to the issue correctly or not. Sorry about that.
Correct me if I'm wrong, but your cluster-create-job is not working because of this:
"command": ["/bin/sh", "-c", "while ! etcdctl --endpoints {{ .Values.store.endpoints }} cluster-health; do sleep 1 && echo -n .; done"],
{{- if eq .Values.store.backend "kubernetes" }}
- --kube-resource-kind={{ .Values.store.kubeRessourceKind }}
{{- else }}
- --store-endpoints={{ .Values.store.endpoints }}
{{- end }}
corresponding links in the source file: https://github.com/Flowkap/stolon-chart/blob/062ba7c6d430085f3c0383a7bb444ff194e8924a/stolon/templates/cluster-create-job.yaml#L25
The point here is that you're trying to use .Values.store.endpoints
in init-container, but your kubernetes provider implementation does not require/provide this variable.
Both parts you mentioned are conditional for when Backend isn't kubernetes. The command executes and created a config map just fine. The DB kick not into action anyway. I compared the config map by my chart with the original one created by the stolon example and there's no difference I can see.
I'd really appreciate of you could try and install my chart and check.
Today I tried it again.
All deployments and pods start fine. The configmap is generated:
{
"kind": "ConfigMap",
"apiVersion": "v1",
"metadata": {
"name": "stolon-cluster-queenly-meerkat-stolon",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/configmaps/stolon-cluster-queenly-meerkat-stolon",
"uid": "af754648-31b7-11e8-b19c-0800271047e3",
"resourceVersion": "8070",
"creationTimestamp": "2018-03-27T12:09:22Z",
"annotations": {
"control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"6783d587\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2018-03-27T12:09:22Z\",\"renewTime\":\"2018-03-27T12:29:05Z\",\"leaderTransitions\":0}",
"stolon-clusterdata": "{\"FormatVersion\":1,\"Cluster\":{\"uid\":\"63373f45\",\"generation\":1,\"changeTime\":\"2018-03-27T12:09:27.279492341Z\",\"spec\":{\"additionalWalSenders\":null,\"additionalMasterReplicationSlots\":null,\"initMode\":\"new\",\"pgHBA\":null},\"status\":{\"phase\":\"initializing\"}},\"Keepers\":{},\"DBs\":{},\"Proxy\":{\"changeTime\":\"0001-01-01T00:00:00Z\",\"spec\":{},\"status\":{}}}"
}
}
}
This one is the configmap which gets created by the original example in https://github.com/sorintlab/stolon/tree/master/examples/kubernetes
{
"kind": "ConfigMap",
"apiVersion": "v1",
"metadata": {
"name": "stolon-cluster-kube-stolon",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/configmaps/stolon-cluster-kube-stolon",
"uid": "1b30b4f4-31ba-11e8-b19c-0800271047e3",
"resourceVersion": "7849",
"creationTimestamp": "2018-03-27T12:26:42Z",
"annotations": {
"stolon-clusterdata": "{\"FormatVersion\":1,\"Cluster\":{\"uid\":\"0aa3338d\",\"generation\":1,\"changeTime\":\"2018-03-27T12:26:42.448649577Z\",\"spec\":{\"additionalWalSenders\":null,\"additionalMasterReplicationSlots\":null,\"initMode\":\"new\",\"pgHBA\":null},\"status\":{\"phase\":\"initializing\"}},\"Keepers\":{},\"DBs\":{},\"Proxy\":{\"changeTime\":\"0001-01-01T00:00:00Z\",\"spec\":{},\"status\":{}}}"
}
}
}
The services find all pods but the keepers don't initialize with the following recurring log:
2018-03-27T12:28:47.402Z INFO cmd/keeper.go:964 our keeper data is not available, waiting for it to appear
And the proxies:
2018-03-27T12:29:58.279Z INFO cmd/proxy.go:234 no db object available, closing connections to master {"db": ""}
And sentinels:
2018-03-27T12:31:59.022Z ERROR cmd/sentinel.go:1815 failed to update cluster data {"error": "cannot choose initial master: no keepers registered"}
2018-03-27T12:32:04.044Z INFO cmd/sentinel.go:778 trying to find initial master
As if they don't recognize the configmap.
To be sure I also deleted all occurences in the chart concerning etcd. These are definitely not active.
Also the generated config for the keepers look identical (different run therefore the helm release is different. Of course they match in the same run)
Generated keeper from the official kubernetes example:
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "stolon-keeper-0",
"generateName": "stolon-keeper-",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/pods/stolon-keeper-0",
"uid": "4a42dfb5-31c1-11e8-b19c-0800271047e3",
"resourceVersion": "13370",
"creationTimestamp": "2018-03-27T13:18:07Z",
"labels": {
"app": "stolon-keeper",
"controller-revision-hash": "stolon-keeper-6c86f6447c",
"statefulset.kubernetes.io/pod-name": "stolon-keeper-0",
"stolon-cluster": "kube-stolon"
},
"annotations": {
"pod.alpha.kubernetes.io/initialized": "true",
"stolon-status": "{\"infoUID\":\"2d6db0b8\",\"uid\":\"keeper0\",\"clusterUID\":\"579f5f63\",\"bootUUID\":\"f1e8b557-a33d-41ec-9337-58ae39832748\",\"postgresState\":{\"uid\":\"56767d5d\",\"generation\":3,\"listenAddress\":\"172.17.0.5\",\"port\":\"5432\",\"healthy\":true,\"systemID\":\"6537613245307641887\",\"timelineID\":1,\"xLogPos\":50331968,\"pgParameters\":{\"datestyle\":\"iso, mdy\",\"default_text_search_config\":\"pg_catalog.english\",\"dynamic_shared_memory_type\":\"posix\",\"lc_messages\":\"en_US.utf8\",\"lc_monetary\":\"en_US.utf8\",\"lc_numeric\":\"en_US.utf8\",\"lc_time\":\"en_US.utf8\",\"log_timezone\":\"UTC\",\"max_connections\":\"100\",\"shared_buffers\":\"128MB\",\"timezone\":\"UTC\",\"wal_level\":\"replica\"},\"synchronousStandbys\":[],\"olderWalFile\":\"000000010000000000000001\"}}"
},
"ownerReferences": [
{
"apiVersion": "apps/v1beta1",
"kind": "StatefulSet",
"name": "stolon-keeper",
"uid": "4a3fe74a-31c1-11e8-b19c-0800271047e3",
"controller": true,
"blockOwnerDeletion": true
}
]
},
"spec": {
"volumes": [
{
"name": "data",
"persistentVolumeClaim": {
"claimName": "data-stolon-keeper-0"
}
},
{
"name": "stolon",
"secret": {
"secretName": "stolon",
"defaultMode": 420
}
},
{
"name": "default-token-vc9hk",
"secret": {
"secretName": "default-token-vc9hk",
"defaultMode": 420
}
}
],
"containers": [
{
"name": "stolon-keeper",
"image": "sorintlab/stolon:master-pg9.6",
"command": [
"/bin/bash",
"-ec",
"# Generate our keeper uid using the pod index\nIFS='-' read -ra ADDR <<< \"$(hostname)\"\nexport STKEEPER_UID=\"keeper${ADDR[-1]}\"\nexport POD_IP=$(hostname -i)\nexport STKEEPER_PG_LISTEN_ADDRESS=$POD_IP\nexport STOLON_DATA=/stolon-data\nchown stolon:stolon $STOLON_DATA\nexec gosu stolon stolon-keeper --data-dir $STOLON_DATA\n"
],
"ports": [
{
"containerPort": 5432,
"protocol": "TCP"
}
],
"env": [
{
"name": "POD_NAME",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.name"
}
}
},
{
"name": "STKEEPER_CLUSTER_NAME",
"value": "kube-stolon"
},
{
"name": "STKEEPER_STORE_BACKEND",
"value": "kubernetes"
},
{
"name": "STKEEPER_KUBE_RESOURCE_KIND",
"value": "configmap"
},
{
"name": "STKEEPER_PG_REPL_USERNAME",
"value": "repluser"
},
{
"name": "STKEEPER_PG_REPL_PASSWORD",
"value": "replpassword"
},
{
"name": "STKEEPER_PG_SU_USERNAME",
"value": "stolon"
},
{
"name": "STKEEPER_PG_SU_PASSWORDFILE",
"value": "/etc/secrets/stolon/password"
}
],
"resources": {},
"volumeMounts": [
{
"name": "data",
"mountPath": "/stolon-data"
},
{
"name": "stolon",
"mountPath": "/etc/secrets/stolon"
},
{
"name": "default-token-vc9hk",
"readOnly": true,
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 10,
"dnsPolicy": "ClusterFirst",
"serviceAccountName": "default",
"serviceAccount": "default",
"nodeName": "vagrant",
"securityContext": {},
"hostname": "stolon-keeper-0",
"subdomain": "stolon-keeper",
"schedulerName": "default-scheduler"
},
"status": {
"phase": "Running",
"conditions": [
{
"type": "Initialized",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-03-27T13:18:07Z"
},
{
"type": "Ready",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-03-27T13:18:09Z"
},
{
"type": "PodScheduled",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-03-27T13:18:07Z"
}
],
"hostIP": "10.0.2.15",
"podIP": "172.17.0.5",
"startTime": "2018-03-27T13:18:07Z",
"containerStatuses": [
{
"name": "stolon-keeper",
"state": {
"running": {
"startedAt": "2018-03-27T13:18:09Z"
}
},
"lastState": {},
"ready": true,
"restartCount": 0,
"image": "sorintlab/stolon:master-pg9.6",
"imageID": "docker-pullable://sorintlab/stolon@sha256:beaf9a41baaa333564cdca7b6f10ca52f40ae84dea8f11aaf37af703b1d75dda",
"containerID": "docker://9d093d63916f24219438571de5b0a6a5bf33f316494c851f300e58a050863a79"
}
],
"qosClass": "BestEffort"
}
}
And the one by my chart proposal:
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "test-stolon-keeper-0",
"generateName": "test-stolon-keeper-",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/pods/test-stolon-keeper-0",
"uid": "29eb303b-31c2-11e8-b19c-0800271047e3",
"resourceVersion": "13367",
"creationTimestamp": "2018-03-27T13:24:23Z",
"labels": {
"app": "test-stolon-keeper",
"chart": "stolon-0.5.0",
"component": "keeper",
"controller-revision-hash": "test-stolon-keeper-6957ff9c9b",
"heritage": "Tiller",
"release": "test",
"statefulset.kubernetes.io/pod-name": "test-stolon-keeper-0",
"stolon-cluster": "test-stolon"
},
"annotations": {
"pod.alpha.kubernetes.io/initialized": "true",
"stolon-status": "{\"infoUID\":\"d72be3e5\",\"uid\":\"keeper0\",\"clusterUID\":\"383b5c96\",\"bootUUID\":\"7dde52c2-ae54-4d1b-8de5-2389f15a948d\",\"postgresState\":{\"listenAddress\":\"172.17.0.11\",\"port\":\"5432\",\"synchronousStandbys\":null}}"
},
"ownerReferences": [
{
"apiVersion": "apps/v1beta1",
"kind": "StatefulSet",
"name": "test-stolon-keeper",
"uid": "29e44275-31c2-11e8-b19c-0800271047e3",
"controller": true,
"blockOwnerDeletion": true
}
]
},
"spec": {
"volumes": [
{
"name": "stolon-data",
"persistentVolumeClaim": {
"claimName": "stolon-data-test-stolon-keeper-0"
}
},
{
"name": "stolon-secrets",
"secret": {
"secretName": "test-stolon",
"defaultMode": 420
}
},
{
"name": "default-token-vc9hk",
"secret": {
"secretName": "default-token-vc9hk",
"defaultMode": 420
}
}
],
"containers": [
{
"name": "test-stolon-keeper",
"image": "sorintlab/stolon:master-pg9.6",
"command": [
"/bin/bash",
"-ec",
"# Generate our keeper uid using the pod index\nIFS='-' read -ra ADDR <<< \"$(hostname)\"\nexport STKEEPER_UID=\"keeper${ADDR[-1]}\"\nexport POD_IP=$(hostname -i)\nexport STKEEPER_PG_LISTEN_ADDRESS=$POD_IP\nexport STOLON_DATA=/stolon-data\nchown stolon:stolon $STOLON_DATA\nexec gosu stolon stolon-keeper --data-dir $STOLON_DATA\n"
],
"ports": [
{
"containerPort": 5432,
"protocol": "TCP"
}
],
"env": [
{
"name": "POD_NAME",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.name"
}
}
},
{
"name": "STKEEPER_CLUSTER_NAME",
"value": "test-stolon"
},
{
"name": "STKEEPER_STORE_BACKEND",
"value": "kubernetes"
},
{
"name": "STKEEPER_KUBE_RESOURCE_KIND",
"value": "configmap"
},
{
"name": "STKEEPER_PG_REPL_USERNAME",
"value": "repluser"
},
{
"name": "STKEEPER_PG_REPL_PASSWORDFILE",
"value": "/etc/secrets/stolon/pg_repl_password"
},
{
"name": "STKEEPER_PG_SU_USERNAME",
"value": "stolon"
},
{
"name": "STKEEPER_PG_SU_PASSWORDFILE",
"value": "/etc/secrets/stolon/pg_su_password"
},
{
"name": "STKEPPER_DEBUG",
"value": "false"
}
],
"resources": {
"requests": {
"cpu": "100m",
"memory": "512Mi"
}
},
"volumeMounts": [
{
"name": "stolon-data",
"mountPath": "/stolon-data"
},
{
"name": "stolon-secrets",
"mountPath": "/etc/secrets/stolon"
},
{
"name": "default-token-vc9hk",
"readOnly": true,
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 10,
"dnsPolicy": "ClusterFirst",
"serviceAccountName": "default",
"serviceAccount": "default",
"nodeName": "vagrant",
"securityContext": {},
"hostname": "test-stolon-keeper-0",
"subdomain": "test-stolon-keeper",
"schedulerName": "default-scheduler"
},
"status": {
"phase": "Running",
"conditions": [
{
"type": "Initialized",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-03-27T13:24:23Z"
},
{
"type": "Ready",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-03-27T13:24:25Z"
},
{
"type": "PodScheduled",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-03-27T13:24:23Z"
}
],
"hostIP": "10.0.2.15",
"podIP": "172.17.0.11",
"startTime": "2018-03-27T13:24:23Z",
"containerStatuses": [
{
"name": "test-stolon-keeper",
"state": {
"running": {
"startedAt": "2018-03-27T13:24:25Z"
}
},
"lastState": {},
"ready": true,
"restartCount": 0,
"image": "sorintlab/stolon:master-pg9.6",
"imageID": "docker-pullable://sorintlab/stolon@sha256:beaf9a41baaa333564cdca7b6f10ca52f40ae84dea8f11aaf37af703b1d75dda",
"containerID": "docker://e1767b05a4d4b602071ffa2fc274a75008012118431e3a1d476d73c4ebd3ff73"
}
],
"qosClass": "Burstable"
}
}
Trying to run it now. Job keeps failing with permissions issue. Did you create any RBAC roles to make it install ?
$ kubectl logs -f stolon-test-stolon-pd42j
cannot get cluster data: failed to get latest version of configmap: configmaps "stolon-cluster-stolon-test-stolon" is forbidden: User "system:serviceaccount:default:default" cannot get configmaps in the namespace "default"
got it to work. applied rbac manifests from the stolon repo with additional permission for configmaps. We should add it to the chart.
I'm seeing the same behaviour as you are. Based on the logs from the proxy it seems that initialisation does not create db for some reason. Will try to spend more time on it during weekend.
I got it fixed. Finally...
Long story short: The app labels MUST be exactly "stolon-keeper" "stolon-proxy" and "stolon-sentinel" ... https://github.com/sorintlab/stolon/pull/433/commits/38ae6b13b5e161a5bfe0fbe01084ca060eaf2e76#diff-95cdd374e9440fde010ff35d65f8cf3fR54
I'll make a proper commit later today and let you know when its finished. Multiple cluster deploys worked for me at the same time this way. As I'm not sure if other variants (using etcd) need the fullname on the app labels I'll make these changes conditional as well. Also added stolon-cluster labels in some places to maintain the uniqeness of service discovery in the K( cluster itself.
Come back to you when it's done :)
Congratulations, great job!
got it to work. applied rbac manifests from the stolon repo with additional permission for configmaps. We should add it to the chart.
I've got the same problem as you with that. Can you gimme a link to the solution? I'll apply it as well (setup works on a minikube cluster already)
https://github.com/Flowkap/stolon-chart/commit/33f4ff3d69c6f5edf43a3437d6e5731514947910
Probably it's easier if I do the PR and you add the configmap permission.
For RBAC you need to add configurations similar to those: https://github.com/sorintlab/stolon/blob/master/examples/kubernetes/role-binding.yaml https://github.com/sorintlab/stolon/blob/master/examples/kubernetes/role.yaml
with only difference - you need to add configmaps here: https://github.com/sorintlab/stolon/blob/master/examples/kubernetes/role.yaml#L19
But I can add it later, after PR will get merged (I'll review and test in during weekend)
It'll be nice to support kubernetes backend by using configmaps as well. See https://github.com/sorintlab/stolon/commit/38ae6b13b5e161a5bfe0fbe01084ca060eaf2e76
I'm pretty new to charts and K8 in general, but if you'd like I can try it out?