Closed robertmircea closed 4 years ago
What would be the best strategy to limit number of redeliveries of the same message to subscribers which are failing to ack?
ACK it.. this is not being sarcastic ;-). Really, if your application keeps getting a redelivered message and is unable to acknowledge it because say processing of it is failing, and the application really decides that it does not want to have to deal with it, simply ACK it. The server will stop the redelivery.
Messages are not removed from the message log because they are ack'ed by consumers. Messages are always in the message log since they may have to be delivered to new consumers starting at any sequence in the log: https://github.com/nats-io/nats-streaming-server#message-log
The only advantage of having such configuration in the server would be when you try to limit the redelivery of a message to an application that has exited without calling Close() (therefore server does not know). But with pings from server to client, server will ultimately detect that the client is not responding and will suspend redelivery. The default heartbeat interval is 30seconds. If that's too high, it can be lowered through configuration.
I understand what you are saying, but this means that the app needs to be aware and keep track of the messages for a period of time in order to detect repeated delivery attempts (even the failed ones) up to a maximum. My first thought was to keep the message id in Redis for a while for idempotency checks. This is not very elegant and it scales up to a point. A better mechanism would be in my opinion to keep track on the server of number of redeliveries and either server or the client to have a mechanism to decide if more redeliveries are allowed. This would imply to have an explicit Nack command or to allow the client to instruct the server to deliver based on a specific policy. Probably, what I need here is to have the properties of a queuing system in nats server.
@robertmircea if you're getting the same message redelivered over and over, chances are, a single subscriber will see it multiple times - so you don't need to create a counter on Redis for every single message you receive, only the ones that you're seeing twice (meaning they're getting redelivered in a loop).
Keep an in-memory counter of the messages you've seen with a cuckoo filter and if you see a message twice, then you start counting its redeliveries on Redis. This should scale a lot better since it won't hit Redis everytime, only when you suspect that a message is getting redelivered.
Once you hit enough redelivery attempts, you ACK
the message (to remove it from the server) and delete it from the cuckoo filter and from Redis.
It won't perfectly track the number of redeliveries if you have many subscribers (since each subscriber will have its own counter that needs to see a message twice), but it should be good enough and scale well.
if i have invalid message, for example payload of it created with buggy service, consumers can't success ack this message. what is the preferred way?
Again, from a consumer perspective, if you want to stop the server to redeliver this message, simply ack it. When the message callback is invoked, and say your application cannot process it, you still can call msg.Ack() on it. What you can't do, is remove a "random" message from a message log.
my question more from devops perspective, for example i see in prometheus that some service have big error counter in subscribe handler. what can i do? i can't rewrite service now. so what workaround to temporary fix this issue , why service rewritten and deployed?
Again, from a consumer perspective, if you want to stop the server to redeliver this message, simply ack it. When the message callback is invoked, and say your application cannot process it, you still can call msg.Ack() on it. What you can't do, is remove a "random" message from a message log.
I understand this, but this strategy means that on first failed message processing I should remove (ack
) the message no matter if the error is transient or permanent. Usually, if it is a transient error this means that I should retry processing for a number of times before giving up. NATS server could help in this situation by keeping the number of retries instead of a boolean flag like redelivery. It is very easy to deduct if it is a redelivery by inspecting the count of retries. If it is greater than zero, it means that it was a redelivery. In my case, if I would like to retry for up to n
times, maybe would be more efficient to have an explicit nack
protocol command without waiting for timeout. For example, if the handler throws an exception would be more effective to fail fast. In case that redelivery counter would be n
, I would explicitly ack
it from the client in order to be removed from server.
Yes,does it poossible to count delivery attemps per durable group?
I can check attemps and do some client side action, for example serialize event to store and ack message
Same need. Any updates on this?
@sujeshthekkepatt I commented on the other issue related to this: https://github.com/nats-io/nats-streaming-server/issues/789
NATS server could help in this situation by keeping the number of retries instead of a boolean flag like redelivery.
@kozlovic while I understand that limiting the number of redelivery attempts or implementing any kind of dead-letter queue is out of scope, would it be feasible for NATS Streaming to provide a counter of redeliveries as suggested here (possibly as a new field rather than replacing the existing Redelivery
one, for compatibility with existing consumers)?
@bmcstdio that would be great. I have done a work around using the same concept where we manually add some metadata like number of retries etc and use that fields for pushing to multi stage poison queues and later to a dead letter queue. But as you said having this built in would be perfect. I think they are working on a new revised version of Stan.
@bmcstdio Possibly. The caveat is that this number would likely be "valid" only during the runtime of a server. Meaning that I don't think that it would be feasible to persist the delivery count (but maybe?). Also, in clustering mode, when leader changes, it may not have the redelivery count (again, if that info is not stored/replicated). In this worst case scenario (valid only at runtime/leader election), would that still be valuable?
@kozlovic I think it would still be valuable, yes, but of course the absolute best would be to persist that information. I am not familiar at all with the NATS/NATS Streaming code base, but I'd be willing to contribute to this somehow (either implementing, reviewing or testing).
Redeliveries counter would be nice to have for us as well! Even if only during the runtime (not persistent). Thanks!
@bmcstdio @sujeshthekkepatt @kell18 Ok, I have decided to add RedeliveryCount (only runtime at the moment, and won't survive a leader election if leader is different node). Starting with the client's repo that holds the PubMsg protobuf: https://github.com/nats-io/stan.go/pull/295.
Server PR will be submitted shortly after we agree on the field name/type in the client repo. Thanks!
@bmcstdio @sujeshthekkepatt @kell18 Question for you guys.. the redelivery count is per subscription, but in the case of a queue group, would you expect the redelivery count to be for the group, or individual member?
@kozlovic I'd expect it to be for the group.
For group.
пт, 20 дек. 2019 г., 18:58 Bruno M. Custódio notifications@github.com:
@kozlovic https://github.com/kozlovic I'd expect it to be for the group.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nats-io/nats-streaming-server/issues/728?email_source=notifications&email_token=AADVQGYCB2BN42SAUSZRTH3QZTTUHA5CNFSM4GPTUCO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNJTDY#issuecomment-567974287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVQGY4NWZXEC5BZAZBOZTQZTTUHANCNFSM4GPTUCOQ .
For group for sure!
Ok, I hear you, but in case you did not notice, if you use clustering, as of now, a message is already redelivered to the same member (subject to change in the future, but as of now that is the case), so that will not make much of a difference :-)
Guys... closing this one in favor of #996
What would be the best strategy to limit number of redeliveries of the same message to subscribers which are failing to ack?
For example: If the messages is not acked several times while attempting delivery to client, I want the message to be redelivered for up to x times and then dropped.
Derived question: Is it possible to keep a counter on server of how many times a message has been redelivered to client?