shubhamranjan / dotnet-etcd

A C# .NET (dotnet) GRPC client for etcd v3 +
MIT License
266 stars 60 forks source link

Watch stops listening to changes after server restart #186

Closed Sirozha1337 closed 3 weeks ago

Sirozha1337 commented 1 year ago

Describe the bug

Restarting a server with running etcd breaks watch in services running on other servers.

To Reproduce

  1. Run Server 1 with etcd
  2. Run Server 2 with a service using this library. Example code from service:
    _client = new EtcdClient(_options.ConnectionString, _options.Port);
    try {
     _client.WatchRangeAsync(prefix, callback, EnsureAuthentication(), cancellationToken: cancellationToken)
    }
    catch (Exception ex){
    _logger.LogError(ex, "Error in Watch!");
    }
  3. Stop Server 1
  4. Check that there's no exception in Server 2
  5. Start Server 1
  6. Make changes to keys in etcd
  7. Check that there're no exceptions in Server 2 and "callback" is not called

Expected behavior WatchRange should throw an error, just like it does when etcd server is restarted.

Additional context It seems the problem is the difference between service shutdown and server shutdown:

shubhamranjan commented 1 year ago

Can you confirm the version of the library being used ? We do have retry logic in place for connection failures (StatusCode.Unavailable)

Sirozha1337 commented 1 year ago

Can you confirm the version of the library being used ? We do have retry logic in place for connection failures (StatusCode.Unavailable)

The latest one - 6.2.0-beta

I've managed to fix this problem by providing SocketsHttpHandler configured with timeouts in configureChannelOptions:

new EtcdClient(_options.ConnectionString, _options.Port, configureChannelOptions:
                channelOptions =>
                {
                    var handler = new SocketsHttpHandler();
                    handler.KeepAlivePingDelay = TimeSpan.FromSeconds(30);
                    handler.KeepAlivePingTimeout = TimeSpan.FromSeconds(30);
                    handler.KeepAlivePingPolicy = TimeSpan.FromSeconds(30);

                    channelOptions.HttpHandler = handler;
                    channelOptions.ThrowOperationCanceledOnCancellation = true;
                })

Default handler doesn't ping the connection, so it doesn't know that etcd is down. With this configuration it will send ping packets every 30 seconds and if they timeout it will throw an exception.

shubhamranjan commented 1 year ago

Thank you. That is a good recommendation, will see if some ideal defaults fits in.

sergiiapostol commented 1 month ago

@shubhamranjan did you manage to incorporate this improvement in the recent version?