【Stolon】PostgreSQL cloud native High Availability and more.

vieyahn2017 commented 5 years ago

https://github.com/sorintlab/stolon

go写的，可以作为学习go的重点关注项目

vieyahn2017 commented 5 years ago

如果你想知道更多关于stolon可以参考作者这篇介绍。 https://sgotti.dev/post/stolon-introduction/

vieyahn2017 commented 5 years ago

如何在Kubernetes中部署一个高可用的PostgreSQL集群环境

http://dockone.io/article/2143

【编者的话】本文主要介绍了如何在Kubernetes环境中用Stolon去部署高可用的PostgreSQL，本文从Stolon的结构组成开始，由浅入深介绍原理，从开始安装到最后对其进行failover测试，深入浅出，为以后部署高可用的PostgreSQL提供了一种的解决方案。

vieyahn2017 commented 5 years ago

Stolon 架构图摘抄自Stolon的介绍。 0-tDzE_Y0wg-xPKRfk.png

Stolon 是由3个部分组成的： keeper：他负责管理PostgreSQL的实例汇聚到由sentinel(s)提供的clusterview。 sentinel：it负责发现并且监控keeper，并且计算最理想的clusterview。 proxy：客户端的接入点。它强制连接到右边PostgreSQL的master并且强制关闭连接到由非选举产生的master。

Stolon 用etcd或者Consul作为主要的集群状态存储。

vieyahn2017 commented 5 years ago

Installation

$ git clone https://github.com/lwolf/stolon-chart
$ cd stolon-chart
$ helm install ./stolon
You can also install directly from my repository
helm repo add lwolf-charts http://charts.lwolf.org
helm install lwolf-charts/stolon

安装的过程将会做如下的动作：

首先，会用statefulset创建3个etcd节点。Stolon-proxy和stolon-sentinel也会被部署。Singe time job将集群的安装暂停直到etcd节点状态变成availabe。

chart还会创建两个服务： stolon-proxy——服务来源于官方的例子。他总是指向当前的因该被写入的master。 stolon-keeper——Stolon自己本身不提供任何读取操作的负载均衡。但是Kubernetes的service却可以做到这点。所以对于用户来说，stolon-keeper的读操作是在pod的层面做到负载均衡的。

vieyahn2017 commented 5 years ago

当所有的组件状态变为RUNNING时，我们可以试着连接它们。

我们可以用NodePort这种简单的连接方式部署service。用两个终端分别去连接master service和slave service。在post的过程中，我们假设stolon-proxy服务（RW）已经暴露了30543端口，stolon-keeper服务（RO）已经暴露了30544端口。

连接master并且建立test表

psql --host <IP> --port 30543 postgres -U stolon -W
postgres=# create table test (id int primary key not null,
value text not null);
CREATE TABLE
postgres=# insert into test values (1, 'value1');
INSERT 0 1
postgres=# select * from test;
id | value
---- --------
1 | value1
(1 row)

连接slave并且检查数据。你可以写一些信息以便确认请求已经被slave处理了。

psql --host <IP> --port 30544 postgres -U stolon -W
postgres=# select * from test;
id | value
---- --------
1 | value1
(1 row)

在测试通过后，我们去试试failover功能。

vieyahn2017 commented 5 years ago

测试failover

这个案例是官方代码库中statefullset的一个例子。简单的说，就是为模拟了master挂掉，我们先删除了master的statefulset又删除了master的pod。

kubectl delete statefulset stolon-keeper --cascade=false
kubectl delete pod stolon-keeper-0

然后，在sentinel的log中我们可以看到新的master被选举出来了。

no keeper info available db=cb96f42d keeper=keeper0
no keeper info available db=cb96f42d keeper=keeper0
master db is failed db=cb96f42d keeper=keeper0
trying to find a standby to replace failed master
electing db as the new master db=087ce88a keeper=keeper1

现在，在刚才的那两个终端中如果我们重复上一个命令，我们可以看到如下输出。

postgres=# select * from test;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset:
Succeeded.
postgres=# select * from test;
id | value
---- --------
1 | value1
(1 row)

Kubernetes的service把不可用的pod去掉，把请求转到可用的pod上。所以新的读取连接被路由到了健康的pod上。

最后，我们需要重新创建statefulset。最简单的方法就是更新部署了的helm chart。

helm ls
NAME               REVISION   UPDATED                    STATUS     CHART              NAMESPACE
factual-crocodile  1          Sat Feb 18 15:42:50 2017   DEPLOYED   stolon-0.1.0       default
helm upgrade factual-crocodile .

vieyahn2017 commented 5 years ago

2.用chaoskube模拟随机的pod挂掉

另一个测试集群弹性（resilience）的好方法是用chaoskube。Chaoskube是一个小的服务程序，它可以周期性的在集群里随机的kill掉一些的pod。它也可以用helm charts部署。

helm install --set labels="release=factualcrocodile,
component!=factual-crocodine-etcd" --set
interval=5m stable/chaoskube

这条命令会运行chaoskube，它会每5分钟删除一个pod。它会选择label中release=factual-crocodile的pod，但是会忽略etcd的pod。

在做了几个小时的测试之后，我的集群环境仍然是一致并且工作的很稳定。

vieyahn2017 commented 5 years ago

结论

我仍然在我的开发服务器上运行stolon。到目前为止我还是满意的。他真的很想一个本地的运环境。有很好的弹性和自动化的failover能力。

如果你对它感兴趣-可以查看我的官方repository或者和我的chart。

原文链接：How to deploy HA PostgreSQL cluster on Kubernetes（翻译：王晓轩） https://medium.com/@SergeyNuzhdin/how-to-deploy-ha-postgresql-cluster-on-kubernetes-3bf9ed60c64f#.kexr7gbu7

vieyahn2017 / repos