sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.66k stars 447 forks source link

ETCD v3 api support #300

Closed aaronyoungkash closed 6 years ago

aaronyoungkash commented 7 years ago

Stolon is currently using the ETCDv2 client, which apparently doesn't persist key values to sidecar backup (https://github.com/coreos/etcd-operator/issues/1223). Is there a timeline for moving toward using v3, or providing an option to use v3?

sgotti commented 7 years ago

@aaronyoungkash see #100 . We'll add support for etcdv3 in future, probably creating a common interface and directly using the etcd3 client instead of libkv. This will also need a common way to do leader election for the sentinels. Of course we'll be very happy to accept pull requests.

sergik commented 7 years ago

we are using exactly the same configuration. etcd-oprator and stolon. I had to create libkv store implemetation for etcdv3. You can find it here https://github.com/TargetProcess/stolon/blob/feature/etcd3/pkg/store/etcdv3-store.go Plan to create pull request to libkv in future.

theothermike commented 7 years ago

@sgotti If possible, we'd love if this was expedited. We are feeling very nervous with our current install and losing quorum in etcd thus requiring a re-init in Stolon. Google Container Engine has a node failure, it seems, at least once a month, and we're mitigating the limitation by having an increased amount of nodes, and many replicas of etcd.

I do love Stolon and what it provides for running Postgres in Kubernetes, but we had assumed that etcd-operator was backing everything up, and found out the hard way during a Kubernetes upgrade that wasn't the case.

I hope @sergik's patch can be used sooner rather than later

sgotti commented 7 years ago

@sergik What I don't really like about libkv is that it doesn't accept a Context, it would be really useful for returning an error on slow stores (very useful to make to proxy close the connection if not answer comes from the store after a deadline)

@theothermike not sure about the real problem but looks like a bad behavior of etcd operator, in a normal etcd cluster with persistent volumes you won't lose data.

guillelb commented 7 years ago

I am using Kubernetes cluster 1.6 and the new 1.7. All of this versions use ETCD3. I want to use the same etcd for stolon. Is there any way to use stolon with ETCD3?

sgotti commented 7 years ago

@guillelb see https://github.com/sorintlab/stolon/issues/100#issuecomment-303666044 . This issue is for etcd v3 API support. Stolon already works with etcd3 but uses the v2 API.

guillelb commented 7 years ago

How can I use the v3 API? When I try to launch sentinels, it shows me an error like "etcd is misconfigured".

guillelb commented 7 years ago

Is there any other way? Do you have any estimation date to have v3 support?

Thank you.

guillelb commented 7 years ago

In other words, How can I use the v2 API, on a ETCD3? By default, sentinel shows me the error about etcd misconfigured.

Thank you.

sgotti commented 7 years ago

@guillelb please ask support questions in the gitter channel or mailing list providing some more details (to not pollute this issue)

sgotti commented 7 years ago

@sergik Are you willing to open a PR on stolon for your libkv etcv3 store and related integration with the stolon components? It could be a starting point. Then after this we can see if libkv will accept your store upstream and also we can think on how to handle missing ctx in libkv.

sgotti commented 7 years ago

I added a note here about possible stolon cluster problems restoring from a not current backup: https://github.com/sorintlab/stolon/issues/246#issuecomment-315064106

sgotti commented 7 years ago

@aaronyoungkash @sergik @theothermike I think we should support etcdv3 api because it's the new de facto standard api of etcd. Instead implementing it just because you can backup and restore etcdv3 data or because etcd-operator uses it to recreate a failed etcd cluster is not a good reason because restoring old etcd stolon cluster data from a backup is definitely a bad idea. See #320 (now merged).

I also wrote a detailed post with an analysis on current solutions, their problems and how we are currently achieving a persistent etcd cluster inside k8s: https://sgotti.me/post/kubernetes-persistent-etcd/

@sergik do you have any news on your libkv etcdv3 store and if you want to merge it inside stolon?

sgotti commented 6 years ago

PR for etcd v3 api: #393

sgotti commented 6 years ago

fixed in #393