nats-io / stan.go

NATS Streaming System
https://nats.io
Apache License 2.0
706 stars 117 forks source link

Duplicate Redelivery Of Messages Already Acknowledged #260

Closed Mr7Sir closed 5 years ago

Mr7Sir commented 5 years ago

Hi, I'm currently trying to set up deliver once setup using nats-streaming in manual acknowledge mode but the same acknowledged messages get redilvered sometimes. I've attached a sample of my code and also the output for the same. Is there any way to ensure deliver only once? nats streaming server version: v0.14.1 go version: v1.11.5

func subscribe(sc stan.Conn) {
    mcb := func(msg *stan.Msg) {
        msg.Ack()
        log.Println(msg)
    }
    if _, err := sc.QueueSubscribe("foo", "worker", mcb, stan.DeliverAllAvailable(), stan.DurableName(durableName), stan.SetManualAckMode()); err != nil {
        log.Fatal(err)
    }
}

NatsDuplicate

ripienaar commented 5 years ago

NATS Streaming is "At-Least-Once Delivery", getting dupe ack'd messages is expected.

If you want to avoid that you have to track what you saw and ignore things you've already seen, streaming server on its own does not support a at most once delivery model.

Mr7Sir commented 5 years ago

Thanks for the quick reply

I'm currently trying to set up deliver once setup using nats-streaming in manual acknowledge mode but the same acknowledged messages get redilvered sometimes. I've attached a sample of my code and also the output for the same. Is there any way to ensure deliver only once?

How does manually acknowledging messages work if the messages get redelivered either way?

Also, is exactly-once-delivery in the road map for nats streaming? Or will nats ever have persistence and exactly-once-delivery in the near future?

ripienaar commented 5 years ago

The problem is basically there are many buffers involved and many timeouts, and no guarantees that your ack gets to the streaming server in time and so forth. Lots of uncertain networks and so forth and so on. Now you could build a at most once system but it would be very complex and require lots of coordination and such.

So streaming server in the face of uncertainty will do its best to always get you the message and all it can do is to redeliver. It favours this model over adopting a ton of dependencies and overhead and complexity - but this means you have to design things so your message processing is effectively idempotent.

I doubt a change is on the road map, its a pretty deep decision to support this model.

mmanna-sapfgl commented 5 years ago

To add to @ripienaar - Exactly once and at-least once can also be fine tuned by how you choose to receive your application. There is almost never going to be an "Exactly Once" in any messaging system. Without having this also checked at your end. We are looking to migrate to NATS from Apache Kafka - and our system also has duplicate check built in so that idempotency is maintained. Don't know if that helps you understand your steps.

aricart commented 5 years ago

Just to make sure we are covering all the bases. I think your concern for exactly once is because your subscriber is seeing all messages again due to the subscription options you provided. On the snippet above, note that you are asking to deliver all messages available. This will include messages that have been seen.

You may want to checkout the sections on durability, which allow the server to remember what the subscriber (or queue subscribers) have seen:

As @ripienaar hints, "Exactly Once Semantics" from the messaging system is really a heroic statement that falls short. Because of the nature of the distributed systems, it is possible for the client to have sent the acknowledgement but for the server to never receive it. Or for the logic within the receiver to fail/crash. Since the server cannot tell the difference it would have to retry sending the message again. So really only the application knows if it has seen the input when presented with it.

kozlovic commented 5 years ago

Regardless of the exactly-once issue, I am concerned that you see the messages being redelivered. Something else is going on. I would like to know better what exactly you are doing since this should not be happening in normal situations. The excerpt of your code is not enough. Show us how you start/configure the server, do you restart the server? Are the messages already published before starting the queue sub? Are you queue sub exiting quickly? With or without closing the connection? Have you run the server with -SDV to see if the ack are received from the queue subscriber?

Mr7Sir commented 5 years ago

@ripienaar The thing is, as you can see the code, I sent a manual acknowledgement before logging the message data, but when we look at the output the logs for the same message have been shown twice; which means that the subscriber sent a manual acknowledgement back to the server and then logged the message. The server hasn't restarted at any point now, and while resubscribing the server still re-triggers old messages that it supposedly got an acknowledgement for.

@mmanna-sapfgl The problem with this is that i would require another persistent store at the subscriber end to deal with the sequence ID duplication ( planning to run everything in docker containers, and having this issue occur on container restarts)

@aricart I got the this issue while using queue subscribe with durable name when i sporadically tried to connect/disconnect the subscriber.

@kozlovic I start the server by giving almost all defaults for the "foo" channel that i'm using. The messages are continuously being published while the subscriber is on. I was making my subscriber client disconnect from the server and reconnect manually to check if the messages are getting redelivered while not duplicating the messages, and i didn't run the server with "-sdv" logs. What i did basically was set a loop that was continuously publishing to the server while a subscriber is subscribed to the same channel; then i randomly disconnected and reconnected to the server.

Thanks for all your help guys, I can attach my code here if it'll help further in figuring out a solution (although its just basic go code).

Mr7Sir commented 5 years ago

Thanks guys, I figured out a sort of a work-around by setting max-in-flight as 1 for the subscriber as i assumed that the subscriber was starting after disconnecting before the server could accept/process the acknowledgement and hence messages were being duplicated.

ripienaar commented 5 years ago

You will still get duplicates. Your code need to handle the situation of a duplicate will cause you issues.

Mr7Sir commented 5 years ago

You will still get duplicates. Your code need to handle the situation of a duplicate will cause you issues.

@ripienaar If I set max-in-flight as 1 why would i still get duplicates??? 'coz the server will only send a message when it receives an acknowledgement for the previous message, and, the server will only send messages it hasn't received an acknowledgement for during subscriber reconnects. (I tested it with 1M messages, and it seemed to work) Please tell me if there is something I'm missing here??

ripienaar commented 5 years ago

Yes as explained above. NATS Streaming does not support at most once delivery. You can read the README as well.

If simply setting this made it at most once rest assured the document will say so.

A test under good conditions will look good with no dupes but under real world with networking, maintenance, congestion etc - basic realities of distributed computing - NATS Streaming cannot maintain at most once semantics.

Even if you don’t understand what we describe and why this happens I strongly urge you to just accept the word of authors, maintainers and community when everyone tells you it’s not at most once and dupes will happen :)

kozlovic commented 5 years ago

@Mr7Sir Glad you figure out that this is was due to in flight messages. But @ripienaar is right, you need to plan for duplicates. You ask how can this happen with max inflight set to 1? Picture this: 1- server sends 1 message 2- application gets it, processes it and then ack 3- while ack is inflight, server crashes and is restarted 4- server will redeliver the last message because it did not get an ack

In step 3 it could be that the ack is simply lost without a server crash.

So again, reducing max inflight to 1 will reduce the likelihood of duplicate, but not totally prevent it. Again, NATS Streaming is at-least-once.