Closed samstride closed 2 years ago
Thank you for your message. There is a feature #7 to automatize the process to upgrade a Postgres major version. It should be available by the end of the year 2021.
You can do it manually as follows:
1) Pause the Kubegres controller by running:
kubectl scale --replicas=0 deployment.apps/kubegres-controller-manager -n kubegres-system
2) Connect to each Pod with kubectl exec -it <podName> bash
and run:
pg_upgrade
3) Once all Pods are upgraded, and are in a running state (make sure to check each Pod logs), you can resume the Kubegres controller by running:
kubectl scale --replicas=1 deployment.apps/kubegres-controller-manager -n kubegres-system
Please let me know if the above works for you in your dev environment.
Ok, I have noted these steps and will keep an eye out for availability of the automation.
Once again, thank you for maintaining this repo.
@alex-arica , sorry for re-opening this issue but wanted to clarify something for upgrading between minor versions.
The steps provided above is that for both major and minor version upgrades?
I tried to upgrade from 14.0 -> 14.1.
These are the steps I followed:
1.12
to 1.13
.kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.13/kubegres.yaml
# my-postgres.yaml
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
name: mypostgres
namespace: default
spec:
replicas: 3
image: postgres:14.1
database:
size: 200Mi
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: mypostgres-secret
key: superUserPassword
- name: POSTGRES_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: mypostgres-secret
key: replicationUserPassword
kubectl apply -f my-postgres.yaml
Only 1 of the replicas got upgraded.
Did I miss something or are the steps the same as upgrade for major version?
The upgrade of minor version should work. Postgres allows to upgrade between minor versions.
In this use case, what may have happened is the 1st Pod which was upgraded had an issue and Kubegres did not continue the upgrade. For safety reason, Kubegres upgrades a replica first, if it does not work, it will stop the upgrade and log that the failing pod should be investigated manually.
Do you you have the logs of the Kubegres controller?
To read the logs, you can follow those steps:
kubectl get all -n kubegres-system
kubectl logs pod/kubegres-controller-manager-[to replace] -c manager -n kubegres-system -f
kubectl get all -n kubegres-system
NAME READY STATUS RESTARTS AGE
pod/kubegres-controller-manager-6887874b9d-f7c4m 2/2 Running 0 41h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubegres-controller-manager-metrics-service ClusterIP 10.43.210.165 <none> 8443/TCP 34d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kubegres-controller-manager 1/1 1 1 34d
NAME DESIRED CURRENT READY AGE
replicaset.apps/kubegres-controller-manager-6887874b9d 1 1 1 41h
replicaset.apps/kubegres-controller-manager-75b6765589 0 0 0 34d
Logs are too big when I run kubectl logs pod/kubegres-controller-manager-6887874b9d-f7c4m -c manager -n kubegres-system -f
. Pasting only the error that I see a lot in the logs.
ERROR controllers.Kubegres Last Spec enforcement attempt has timed-out for a StatefulSet. You must apply different spec changes to your Kubegres resource since the previous spec changes did not work. Until you apply it, most of the features of Kubegres are disabled for safety reason. {"StatefulSet's name": "postgres-3", "One or many of the following specs failed: ": "Resources: &ResourceRequirements{Limits:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Requests:ResourceList{cpu: {{5 -1} {<nil>} DecimalSI},memory: {{524288000 0} {<nil>} 500Mi BinarySI},},}", "error": "Spec enforcement timed-out"}
I made sure there is enough CPU and memory.
I also reduced CPU to 200m
and re-applied just to see what would happen:
DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Kubegres","namespace":"postgres","name":"postgres","uid":"7af18b13-af77-4271-81e6-1e2b6c29dd9c","apiVersion":"kubegres.reactive-tech.io/v1","resourceVersion":"32256132"}, "reason": "StatefulSetOperation", "message": "The Spec is NOT up-to-date for a StatefulSet. 'StatefulSet name': postgres-3, 'SpecName': Resources, 'Expected': &ResourceRequirements{Limits:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Requests:ResourceList{cpu: {{2 -1} {<nil>} DecimalSI},memory: {{524288000 0} {<nil>} 500Mi BinarySI},},}, 'Current': &ResourceRequirements{Limits:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Requests:ResourceList{cpu: {{200 -3} {<nil>} 200m DecimalSI},memory: {{524288000 0} {<nil>} 500Mi BinarySI},},}"}
DEBUG controller-runtime.manager.events Warning {"object": {"kind":"Kubegres","namespace":"postgres","name":"postgres","uid":"7af18b13-af77-4271-81e6-1e2b6c29dd9c","apiVersion":"kubegres.reactive-tech.io/v1","resourceVersion":"32256132"}, "reason": "StatefulSetSpecEnforcementTimedOutErr", "message": "Last Spec enforcement attempt has timed-out for a StatefulSet. You must apply different spec changes to your Kubegres resource since the previous spec changes did not work. Until you apply it, most of the features of Kubegres are disabled for safety reason. 'StatefulSet's name': postgres-3, 'One or many of the following specs failed: ': Resources: &ResourceRequirements{Limits:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Requests:ResourceList{cpu: {{2 -1} {<nil>} DecimalSI},memory: {{524288000 0} {<nil>} 500Mi BinarySI},},} - Spec enforcement timed-out"}
DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Kubegres","namespace":"postgres","name":"postgres","uid":"7af18b13-af77-4271-81e6-1e2b6c29dd9c","apiVersion":"kubegres.reactive-tech.io/v1","resourceVersion":"32256132"}, "reason": "StatefulSetOperation", "message": "The Spec is NOT up-to-date for a StatefulSet. 'StatefulSet name': postgres-3, 'SpecName': Resources, 'Expected': &ResourceRequirements{Limits:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Requests:ResourceList{cpu: {{2 -1} {<nil>} DecimalSI},memory: {{524288000 0} {<nil>} 500Mi BinarySI},},}, 'Current': &ResourceRequirements{Limits:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Requests:ResourceList{cpu: {{200 -3} {<nil>} 200m DecimalSI},memory: {{524288000 0} {<nil>} 500Mi BinarySI},},}"}
Please let me know if you need any other info.
Thanks for helping out.
Thank you for those details.
Looking to the logs it seems like the issue is not because Postgres image was upgraded but because of the contents of the field "resources" in the YAML of "kind: Kubegres".
Could you please share the contents of the YAML containing the configuration of the Postgres cluster?
Could you please share the logs of the Pod "postgres-3" ?
I used the same resource values below when I first set it up.
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
name: postgres
namespace: postgres
spec:
replicas: 3
image: postgres:14.1
database:
size: 20Gi
storageClassName: postgres-nfs
resources:
limits:
memory: "1Gi"
cpu: "1"
requests:
memory: "500Mi"
cpu: "0.5"
failover:
isDisabled: false
backup:
schedule: "45 */1 * * *"
pvcName: postgres-backup
volumeMount: /var/lib/backup
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres
key: superUserPassword
- name: POSTGRES_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: postgres
key: replicationUserPassword
Logs from postgres-3
kubectl logs -f postgres-3-0 -n postgres
PostgreSQL Database directory appears to contain a database; Skipping initialization
2021-11-16 20:57:11.240 GMT [1] LOG: starting PostgreSQL 14.1 (Debian 14.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2021-11-16 20:57:11.240 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-11-16 20:57:11.240 GMT [1] LOG: listening on IPv6 address "::", port 5432
2021-11-16 20:57:11.242 GMT [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-11-16 20:57:11.267 GMT [25] LOG: database system was shut down in recovery at 2021-11-16 20:57:06 GMT
2021-11-16 20:57:11.269 GMT [25] LOG: entering standby mode
2021-11-16 20:57:11.280 GMT [25] LOG: redo starts at 1/137D3828
2021-11-16 20:57:11.280 GMT [25] LOG: consistent recovery state reached at 1/137D3910
2021-11-16 20:57:11.280 GMT [25] LOG: invalid record length at 1/137D3910: wanted 24, got 0
2021-11-16 20:57:11.281 GMT [1] LOG: database system is ready to accept read-only connections
2021-11-16 20:57:11.296 GMT [29] LOG: started streaming WAL from primary at 1/13000000 on timeline 1
Thank you for those details.
From those log details, I am not able to find the root cause of the issue that you are experiencing.
I will try reproducing this issue on my local environment by reusing the YAML that you shared. I will let you know once I have more info.
The steps that I will follow will be:
Please let me know if the steps above are the ones that you followed before experiencing this issue.
@alex-arica ,
I Installed operator Kubegres 1.12
and Postgres 14
.
I upgraded operator to 1.13
and changed Postgres to 14.1
.
After deploying 14.1 since I ran into the issue, I modified CPU values from 0.5
to 0.2
and back to 0.5
.
I released a cluster of 3 Postgres pods with the following YAML :
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
name: mypostgres
namespace: default
spec:
replicas: 3
image: postgres:14
#port: 5432
database:
size: 200Mi
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: mypostgres-secret
key: superUserPassword
- name: POSTGRES_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: mypostgres-secret
key: replicationUserPassword
Once the 3 pods were running as expected, I updated the YAML above by setting: image: postgres:14.1
. Kubegres has upgraded all pods from version 14 to 14.1. Looking to the logs all pods are running fine.
I could not reproduce the issue that you reported about version upgrade.
I suggest that you try using the minimum configuration, as the YAML above. Then you can add more options in the YAML by steps, until it fails. That way you can identify the specific configuration which fails.
Please let me know if you need any help.
@alex-arica , ok, so the upgrade from 14.0
to 14.1
worked as soon as I got rid of the resources section in the yaml.
Hmmm, looks like a bug which causes the upgrade to fail if the resources section is present?
Thank you for reporting this. I've spent few hours to understand why when a field ''resources'' contains a decimal point, it would be an issue when updating it.
For example, when creating a resource of kind: Kubegres
, if the following value for cpu
contains a decimal point:
...
requests:
cpu: "0.5"
memory: "500Mi"
Then when creating the resource of kind: Kubegres
, Kubernetes would reformat that decimal point value to:
...
requests:
cpu: "500m"
memory: "500Mi"
Which is fine and works ok. However, after the creation of the resource, when we edit that cpu
value to another decimal value, such as:
requests:
cpu: "0.4"
memory: "500Mi"
Kubernetes would keep the cpu
value as above with 0.4
rather than 400m
for the resource of kind: Kubegres
. However, when Kubegres operator sets that new cpu
value to the StatefulSet they get formatted as follows:
requests:
cpu: "400m"
memory: "500Mi"
So the equality comparaison fails.
I made a change with the equality check and the changes are available with Kubegres version 1.14.
Kubegres version 1.14 is available with the changes that we discussed about in this issue.
Please see the release page: https://github.com/reactive-tech/kubegres/releases/tag/v1.14
Thank you @samstride for reporting this issue.
To install Kubegres 1.14, please run:
kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.14/kubegres.yaml
I am closing this issue.
Thank you for maintaining this repo.
I am looking for steps/recommendations for upgrading between minor versions and major versions.
I am guessing that upgrading between minor versions is as simple as changing the container image, i.e.
postgres:13.2
->postgres:13.4
.Now that the official image for Postgres 14 is available, are there any steps that need to be followed to go from
postgres:13.2
->postgres:14.0
?Cheers.