Closed JuliuszJ closed 3 years ago
Hi Juliusz,
Thank you for your kind feedback about Kubegres.
In regards to the 1st test, how did you scale down STS of secondary Postgres to zero? Did you delete the STS?
In second test, which command did you use to stop a k3d node?
Once I have those details, I will try reproducing the issue.
Hi Juliusz, since I have not heard from you after 2 days. I am closing this issue. Please re-open it when you have more details to provide as per my previous message.
Hi Alex, in 1st test I scaled down to 0 of STS only, I did not delete it. In 2nd test I issued docker stop command to stop specific container running k3d node.
BTW. As no repo collaborator I can not reopen my issue:
Thank you for those details.
What command did you use exactly to scale down to 0 of STS? Did you edit an STS and set its replica to 0?
And once you have done the above, what were the logs in the Kubegres controller?
kubectl scale sts my-instance-of-kubegres-1 --replicas 0
Kubegres is an operator which connects to the API of Kubernetes. And Kubernetes notifies Kubegres when actions are performed on STS, Pods and services.
The reason why I need the controller's logs is to make sure that when you run the command kubectl scale sts my-instance-of-kubegres-1 --replicas 0
then Kubernetes notified Kubegres about that spec change.
The same approach applies for docker stop command to stop specific container running k3d node.
If Kubernetes does not notify Kubegres, there is nothing we can do.
Actually command kubectl scale sts my-instance-of-kubegres-1 --replicas 0
stops STS which runs primary DB, and Kubegress works great in such case: 1. secondary DB is promoted, 2. new secondary is created. 3. STS with old primary is deleted. The problem is when I issued kubectl scale sts my-instance-of-kubegres-2 --replicas 0
to stop STS which runs secondary DB. I expected then new secondary will be created. But it not happened. How can I gather requested log?
Thanks to you, I have sufficient information to investigate this issue. I will have time to investigate it tomorrow.
We have a set of automatised tests to simulate failover. Those tests check by either deleting a Pod or a StatefulSet. Perhaps we missed one use case. Let's see.
@JuliuszJ I released a "beta" version in the main branch which fix the issue that you reported. To install it and test it please run:
kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/main/kubegres.yaml
Please let me know if it works for you. Once you confirmed it, I will release a new version of Kubegres.
@JuliuszJ Do you think you could help by testing this change today? I am planning to release it Wednesday.
All you have to do is to conduct the 2 tests that you mentioned in your initial message when you created this issue.
I released a "beta" version in the main branch which fix the issue that you reported. To install it and test it please run:
kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/main/kubegres.yaml
Please let me know if it works for you. Once you confirmed it, I will release a new version of Kubegres.
@alex-arica @ylck I ran both my tests and they finished successfully. Thank you very much! However after test with stopping of k3d node I noticed something strange. After execution of command:
kubectl exec -it my-kubegres-instance-6-0 -- /bin/bash
i got message:
Defaulted container "my-kubegres-instance-6" out of: my-kubegres-instance-6, setup-replica-data-directory (init)
It seems that sidecar (init) container setup-replica-data-directory
is still running. It was not happened when I scaled STS to 0.
Thank you for checking. What do you see in the logs of that pod?
on new primary:
02/11/2021 10:03:11 - Attempting to promote a Replica PostgreSql to Primary... 02/11/2021 10:03:11 - Promoting by creating the promotion trigger file: '/data/pgdata/promote_replica_to_primary.log'
on new secondary:
02/11/2021 10:04:24 - Attempting to copy Primary DB to Replica DB... ls: cannot access '/data/pgdata': No such file or directory 02/11/2021 10:04:24 - Copying Primary DB to Replica DB folder: /data/pgdata 02/11/2021 10:04:24 - Running: pg_basebackup -R -h rbd-citus-coord -D /data/pgdata -P -U replication; waiting for checkpoint 0/27542 kB (0%), 0/1 tablespace 11683/27542 kB (42%), 0/1 tablespace 27552/27552 kB (100%), 0/1 tablespace 27552/27552 kB (100%), 1/1 tablespace 02/11/2021 10:04:24 - Copy completed
It seems init containers finished their job.
Thank you. Yes it managed to copy it's data from primary.
Is there anything else in the logs saying that that replica pod is streaming data from primary pod?
Are you able to connect to that replica and run SQL queries?
And is there anything else in the logs saying that that replica pod is streaming data from primary pod?
Are you able to connect to that replica and run SQL queries?
yes
And is there anything else in the logs saying that that replica pod is streaming data from primary pod?
Just I am away from my test environment and I can not check the logs. However after my my tests I checked the replication by creation of a table on primary DB which was successfully replicated to secondary DB.
Thank you for your kind help with testing. If the init container keeps hanging, we can open a new issue about it and I will investigate it.
I will release a new version of Kubegres this evening London time.
Thank you very much for quick fix.
Kubegres version 1.13 is available with the changes that we discussed about in this issue.
Please see the release page: https://github.com/reactive-tech/kubegres/releases/tag/v1.13
Thank you @JuliuszJ for your help!
To install Kubegres 1.13, please run:
kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.13/kubegres.yaml
I am closing this issue.
Hi Alex, many thanks for your amazing work with Kubegres. I tried to test crash of secondary Postgres. I conducted 2 separate tests. In 1st test I scaled down STS of secondary Postgres to zero. In second test I stopped k3d node on which that STS is running. Unfortunately in both test nothing happened. It would be great if Kubegres will run new instance of secondary Postgres in that case to achieve desired state. Regards, Juliusz.