rabbitmq / rabbitmq-stream-dotnet-client

RabbitMQ client for the stream protocol
https://rabbitmq.github.io/rabbitmq-stream-dotnet-client/stable/htmlsingle/index.html
Other
122 stars 41 forks source link

Improve the super stream reconnection #344

Closed Gsantomaggio closed 10 months ago

Gsantomaggio commented 10 months ago

Fixes: https://github.com/rabbitmq/rabbitmq-stream-dotnet-client/discussions/333 Fixes: https://github.com/rabbitmq/rabbitmq-stream-dotnet-client/issues/335

Preface

The PR is focused on making the client more stable during the server restart and better handling the metadata update event.

Most of the changes are in the super stream producer and consumer. See the issue https://github.com/rabbitmq/rabbitmq-stream-dotnet-client/discussions/333

1- What's changed in the super stream part

The RawSuperStreamProducer and RawSuperStreamConsumer expose two callbacks:

ConnectionClosedHandler

The callback is raised each time a partition is closed for some reason. You can check the reason with:

1- public const string Normal = "TCP connection closed normal";
2- public const string Unexpected = "TCP connection closed unexpected";

1- The super stream partition is closed by the Close() method, so there are no problems. It can be helpful in terms of alerting or logs.

2- The super stream partition is closed in an unexpected way like kill the connection, network problems broker restart etc..

In that case, you can reconnect the partition with the ReconnectPartition method. Like:

var config = new RawSuperStreamProducerConfig("MySuperStream")
        {
            Routing = message1 => message1.Properties.MessageId.ToString(),
            ClientProvidedName = clientName,
        };
streamProducer = await system.CreateRawSuperStreamProducer(config);
        config.ConnectionClosedHandler = async (reason, stream) =>
        {
            // here, you can decide if it makes sense to wait a bit before reconnecting the 
            // partition
            var streamInfo = await system.StreamInfo(stream);
            await streamProducer.ReconnectPartition(streamInfo);
        };

MetadataHandler

It happens when there is a change in the stream topology. See the document for more details.

In this case, the server drops all the producers and consumers. So, you need to react to the Metadata Handler event. You can decide to reconnect the producer and consumer using the ReconnectPartition

Producer and Consumer classes

The Producer and Consumer classes handle automatically the ConnectionClosedHandler and the MetadataHandler events for stream and super stream.

2- Fail fast

The Producer and Consumer classes can restore the connection automatically when they are up and running. The classes will fail immediately if there are problems during the first connection. This behaviour is now like the Java Client's.

3- Survive a cluster restart

Please follow this presentation to understand what happens during a cluster restart

You can create your own way to handle the reconnection based on Raw Classes.

codecov[bot] commented 10 months ago

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Comparison is base (7fa33dd) 91.84% compared to head (8e8dc04) 91.80%.

Files Patch % Lines
RabbitMQ.Stream.Client/Reliable/ReliableBase.cs 76.27% 11 Missing and 3 partials :warning:
RabbitMQ.Stream.Client/ClientExceptions.cs 0.00% 5 Missing and 1 partial :warning:
RabbitMQ.Stream.Client/Client.cs 0.00% 3 Missing and 2 partials :warning:
RabbitMQ.Stream.Client/ConnectionsPool.cs 20.00% 3 Missing and 1 partial :warning:
RabbitMQ.Stream.Client/Reliable/ConsumerFactory.cs 90.90% 3 Missing and 1 partial :warning:
RabbitMQ.Stream.Client/RawSuperStreamConsumer.cs 95.65% 0 Missing and 2 partials :warning:
RabbitMQ.Stream.Client/RawSuperStreamProducer.cs 97.18% 0 Missing and 2 partials :warning:
...bitMQ.Stream.Client/Reliable/IReconnectStrategy.cs 33.33% 1 Missing and 1 partial :warning:
RabbitMQ.Stream.Client/Reliable/ProducerFactory.cs 95.91% 1 Missing and 1 partial :warning:
RabbitMQ.Stream.Client/Reliable/Producer.cs 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #344 +/- ## ========================================== - Coverage 91.84% 91.80% -0.05% ========================================== Files 116 116 Lines 11065 11264 +199 Branches 915 895 -20 ========================================== + Hits 10163 10341 +178 - Misses 699 709 +10 - Partials 203 214 +11 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.