Closed itsystem closed 3 years ago
I think tests have failed not by code reasons Successfully tagged stolon:master-pg11 /stolon/examples/kubernetes /stolon error: unable to recognize "role.yaml": Get https://localhost:8443/api?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
Hi I have writed video's for you
This video more short and you can see how two sentinel versions work in one situation when master node is disappeared https://youtu.be/4xto0y1EfHw
This more long video only v0.16.0 in this video all stolon components in failed state before i restarted theirs. https://youtu.be/HJZbeR2BGFs
You can reproduce this situation ONLY if etcd leader node gone without any packet by net, and reply nothing on any other packers like black hole.
Are you saying that etcd takes too much time to elect a new leader in relation to the stolon proxy timeout and you want to avoid the timeout right? If so, is the etcd leader election time something depending on multiple environment conditions or something with a defined upper bound? We should understand if increasing the timeout works only on your enviroment or also everywhere (and probably make the proxy timeout configurable, there's already an open issue for this).
If this timeout be configurable is better because leader election time is configurable in etcd cluster. And fixed timeout can't satisfy all env.
@deepdivenow I created two PRs in place of this: #827 #828
If this timeout be configurable is better because leader election time is configurable in etcd cluster. And fixed timeout can't satisfy all env.
This was already implemented in #756
Thanks!
Add ETCDv3 more statility: