portworx / kvdb

Generic Key-Value interface
Apache License 2.0
40 stars 12 forks source link

Reduce the etcd watch session timeout to 50 seconds. #104

Closed adityadani closed 2 years ago

adityadani commented 2 years ago

What this PR does / why we need it: An etcd watch can fail in the following two ways when an etcd cluster is shutdown

  1. etcd watch is cancelled by the server itself when it tries to gracefully shutdown by sending a "Cancel" watch response. This is not always guaranteed. However if etcd server does send this Cancel response it is instantaneous.

  2. The kvdb client creates an etcd session with a timeout of 2 mins. This means if etcd cluster is shutdown and we don't get the Cancel response, the kvdb client will wait for 2 mins before shutting the watch.

This change reduces this timeout to 50s. The Portworx run-flat feature expects all the nodes to conclude that etcd cluster is down within 1 minute. So all the etcd watches need to error within that time. With this change an kvdb-etcd client watch is guaranteed to fail within 50s instead of 2 mins if etcd is unreachable.