sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.66k stars 447 forks source link

Add ETCDv3 client more statility #821

Closed itsystem closed 3 years ago

itsystem commented 3 years ago

Add ETCDv3 more statility:

itsystem commented 3 years ago

I think tests have failed not by code reasons Successfully tagged stolon:master-pg11 /stolon/examples/kubernetes /stolon error: unable to recognize "role.yaml": Get https://localhost:8443/api?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused

deepdivenow commented 3 years ago

Hi I have writed video's for you

This video more short and you can see how two sentinel versions work in one situation when master node is disappeared https://youtu.be/4xto0y1EfHw

This more long video only v0.16.0 in this video all stolon components in failed state before i restarted theirs. https://youtu.be/HJZbeR2BGFs

You can reproduce this situation ONLY if etcd leader node gone without any packet by net, and reply nothing on any other packers like black hole.

deepdivenow commented 3 years ago

Are you saying that etcd takes too much time to elect a new leader in relation to the stolon proxy timeout and you want to avoid the timeout right? If so, is the etcd leader election time something depending on multiple environment conditions or something with a defined upper bound? We should understand if increasing the timeout works only on your enviroment or also everywhere (and probably make the proxy timeout configurable, there's already an open issue for this).

If this timeout be configurable is better because leader election time is configurable in etcd cluster. And fixed timeout can't satisfy all env.

sgotti commented 3 years ago

@deepdivenow I created two PRs in place of this: #827 #828

If this timeout be configurable is better because leader election time is configurable in etcd cluster. And fixed timeout can't satisfy all env.

This was already implemented in #756

Thanks!