Data persistance is not working in postgres with chart version

Amithpn commented 5 years ago

Hi, we have a problem that when we install for the first time and delete it, all the data added is getting lost.

We found that cluster-create-job.yaml file is responsible for creating cluster and it doesnt need to run twice. Is there any way where we can find the cluster status and don't run the create cluster file every time because it resets the cluster and deletes all the data.

Please let me know what is solution for this?

I always set autoCreateCluster: true and autoUpdateClusterSpec: true. I guess this is what creating the issue for me.

Below is the values.yaml is use:

image:
  repository: sorintlab/stolon
  tag: v0.13.0-pg10
  pullPolicy: IfNotPresent

# used by create-cluster-job when store.backend is etcd
etcdImage:
  repository: k8s.gcr.io/etcd-amd64
  tag: 2.3.7
  pullPolicy: IfNotPresent

debug: false

persistence:
  enabled: true
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClassName: ""
  accessModes:
    - ReadWriteOnce
  size: 10Gi

rbac:
  create: true

serviceAccount:
  create: true
  # The name of the ServiceAccount to use. If not set and create is true, a name is generated using the fullname template
  name:

superuserSecret:
  name: ""
  usernameKey: pg_su_username
  passwordKey: pg_su_password

replicationSecret:
  name: ""
  usernameKey: pg_repl_username
  passwordKey: pg_repl_password

superuserUsername: "stolon"
## password for the superuser (REQUIRED if superuserSecret is not set)
superuserPassword:

replicationUsername: "repluser"
## password for the replication user (REQUIRED if replicationSecret is not set)
replicationPassword:

## backend could be one of the following: consul, etcdv2, etcdv3 or kubernetes
store:
  backend: kubernetes
#  endpoints: "http://stolon-consul:8500"
  kubeResourceKind: configmap

pgParameters: {}
  # maxConnections: 1000

ports:
  stolon:
    containerPort: 5432
  metrics:
    containerPort: 8080

job:
  autoCreateCluster: true
  autoUpdateClusterSpec: true

clusterSpec: {}
  # sleepInterval: 1s
  # maxStandbys: 5

keeper:
  replicaCount: 2
  annotations: {}
  resources: {}
  priorityClassName: ""
  service:
    type: ClusterIP
    annotations: {}
    ports:
      keeper:
        port: 5432
        targetPort: 5432
        protocol: TCP
  nodeSelector: {}
  affinity: {}
  tolerations: []
  volumes: []
  volumeMounts: []
  hooks:
    failKeeper:
      enabled: false
  podDisruptionBudget:
    # minAvailable: 1
    # maxUnavailable: 1

proxy:
  replicaCount: 2
  annotations: {}
  resources: {}
  priorityClassName: ""
  service:
    type: ClusterIP
#    loadBalancerIP: ""
    annotations: {}
    ports:
      proxy:
        port: 5432
        targetPort: 5432
        protocol: TCP
  nodeSelector: {}
  affinity: {}
  tolerations: []
  podDisruptionBudget:
    # minAvailable: 1
    # maxUnavailable: 1

sentinel:
  replicaCount: 2
  annotations: {}
  resources: {}
  priorityClassName: ""
  nodeSelector: {}
  affinity: {}
  tolerations: []
  podDisruptionBudget:
    # minAvailable: 1
    # maxUnavailable: 1

lwolf commented 5 years ago

Hi, I don't think it's because of those flags. autoCreateCluster job is only triggered on first install, see https://github.com/helm/charts/blob/master/stable/stolon/templates/hooks/create-cluster-job.yaml

It seems the you're not using the chart from this repo, but the one from helm/charts. That chart was based on this chart, but over time it's diverged from this one.

So, please open the issue there.

Amithpn commented 5 years ago

Hi, Is there any way i can find out if the stolonctl cluster status? I dont want to run the create-cluster-job everytime. If the stolonctl is already ran, i dont want to run it again. I want to put that check in create-cluster-job.

lwolf commented 5 years ago

But there is a check in the helm chart - https://github.com/lwolf/stolon-chart/blob/master/stolon/templates/cluster-create-job.yaml#L1

create-cluster-job is only triggered on the first install. What version of helm do you use? you need at lease Helm 2.2.0 for this to work

Amithpn commented 5 years ago

Hi @lwolf , i use below helm version. Is this something blocking me?

[root@server ~]# helm version Client: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}

Every time i do helm install. Is it not recommended?

I tried with your chart as well and i see create-cluster-job is running after with below logs:

[root@server ~]# k logs -f postgres-ha-si-postgres-amith-kq4nn WARNING: The current cluster data will be removed WARNING: The databases managed by the keepers will be overwritten depending on the provided cluster spec.

lwolf commented 5 years ago

No, v2.13 should be fine. What do you mean by you do helm install every time? Helm will block your install if you try to install the same release multiple times.

The workflow is to use: helm install ... to install helm upgrade .. to upgrade

Amithpn commented 5 years ago

Hi @lwolf , The workflow i usually follow is:

helm install ...
helm delete --purge ...
helm install ...

Release name changes for every build, hence i always use helm install.

Can you please let me know how do i make sure that the create-cluster-job has run successfully and cluster is ready? I will check if the cluster data is present before i run it again.

lwolf commented 5 years ago

Why do you delete release every time? It's not how helm works. By doing helm delete --purge you delete all the data

Amithpn commented 5 years ago

Hi @lwolf , we are facing issue with queries taking time. Do you have any recommendations for me to improve query performance. Im trying to insert 10000 records at once and i see with stolon-chart there is time lag compared to standalone postgres deployment. Thanks in Advance.

I have enabled below params in deploy.yaml:

clusterSpec: 
  sleepInterval: 1s
  failInterval: 1s

keeper:
  replicaCount: 3
  annotations: {}
  resources:
    limits:
      memory: "4096Mi"
      cpu: "500m"
    requests:
      memory: "1024Mi"
      cpu: "250m"
  priorityClassName: ""

lwolf / stolon-chart

Data persistance is not working in postgres with chart version #33