nats-io / stan.net

The official NATS .NET C# Streaming Client
Apache License 2.0
138 stars 41 forks source link

Messages redelivered even if acked #159

Open robertmircea opened 4 years ago

robertmircea commented 4 years ago

I am encountering an abnormal behaviour in my application while consuming messages from a durable subscription. Occasionally, I see in my logs redelivered messages while the previous ack operations on those messages did not throw. This behaviour happens once in a while. The expected outcome when a message was acked without any exception was to consider it successfully processed.

Is there any situation when a message which was previously acked by the client, can be redelivered by the nats-streaming-server? I am using v0.16.0. The server is configured in FILE store mode.

simplified code...

            var sOpts = StanSubscriptionOptions.GetDefaultOptions();
            sOpts.ManualAcks = true;
            sOpts.AckWait = 30000;
            sOpts.MaxInflight = 20;
            sOpts.DurableName = "sgw.charging";

            stanSubscription = stanConnection.Subscribe(NatsSubjects.CHARGING,
                "charging", sOpts, async (sender, args) =>
                {
                    try
                    {
                        await RealStanMessageHandler(sender, args);
                        args.Message.Ack();
                    }
                    catch (Exception e)
                    {
                        logger.Error(e, "Exception while processing message");
                    }
                });
danielwertheim commented 4 years ago

@ColinSullivan1 I do believe this question is best suited for you, as my experience of NATS Streaming server is very limited.

@robertmircea are you seeing anything else around the same time in application and/or server logs? Like reconnecting or something? Can you confirm that AckWait time has not been exceeded so that the message is seen as "available" again for the server to redeliver?

ColinSullivan1 commented 4 years ago

@robertmircea , yes this can happen. If the ack from the client was sent but the NATS streaming server was shutdown/crashed before processing the ack, the original message may be resent. Also, as @danielwertheim mentioned, if there's a backlog of acks in the NATS streaming server (e.g. the streaming server is under pressure), or there's a very short ack timeout (likely not this case), the client ack message may have been sent to the server but the ack is expired before processed. Finally, because NATS streaming is built atop core NATS (at most once delivery), if something happened in the core NATS system the client ack might get lost.

You can check the StanMsg.Redelivered property is set to give you a hint if you've already received this message.

Regardless, because NATS Streaming is at-least-once delivery applications should be idempotent. As an aside, even in exactly once systems applications need to be idempotent to handle badly timed crashes on either end.

robertmircea commented 4 years ago

What is the most common way to handle idempotency that you've seen? I have multiple subscribers running in different processes across machines and I would like to have the following logic:

  1. detect if a message was re-delivered up to x times.
  2. if more than x times redelivered, ack it to mark it as processed. If less than x times, try to do the business logic.
  3. even after x times and successful ack (no comm exceptions), I still need to have a way to check if the message will be further redelivered after a while because there is no guarantee that the ack actually succeeded in NATS streaming server.

For the moment I thinking about keeping the sequence number in Redis in order to be shared by all subscribers but I am not sure how this will penalize the performance. What other performant techniques for keeping the message id for idempotency are you aware? Memcached? Any other?

ColinSullivan1 commented 4 years ago

@robertmircea , this is very specific to your use case and we'd have to learn more about it. What is your subscriber doing - does it save data someplace?

One performant technique is to use a sliding window of IDs (or sequence numbers) to check in your application. Upon startup your service could read from redis (or some data source if you are already saving data) and hold the last N ids of processed messages in memory. If you detect a duplicate id, then you can skip processing. Older ids are dropped as new ones are added in order to bound memory usage.

robertmircea commented 4 years ago

Great suggestion, didn't think about this... It will work reliably in a single instance environment without the need to go to Redis for each message delivery in subscription. How would this technique work in a multi-instance app environment? If each instance is keeping a private, in-memory bounded dictionary of "seen" ids, without synchronizing (e.g. via Redis) apart start up time, what will happen if NATS server will deliver the same message to a different app instance?

ColinSullivan1 commented 4 years ago

With load balanced queue subscribers, you'll need some way of coordinating the unique Ids which would mean sharing that information somehow (a data source) to do it perfectly - back to the lookups in Redis.

Can you detect duplicates when processing the data? e.g. check for a key in a DB? Depending on your performance requirements, that could be simplest, and would be easy to scale.

There is a StanMsg.Redelivered property that hints that a particular message is being redelivered. You can use this to optimize your check for dups. There's a caveat though - depending on your environment, if the NATS streaming server crashes at the wrong time it is theoretically possible you can get a redelivered message without the flag set.