nsqio / nsq

A realtime distributed messaging platform
https://nsq.io
MIT License
24.82k stars 2.89k forks source link

nsqd: inspecting deferred messages #848

Open therealbill opened 7 years ago

therealbill commented 7 years ago

I don't see any docs on the subject so I'm not sure if this is a docs or feature request. ;)

While knowing there are messages deferred is great, the ability to inspect deferred messages is even better. Think of, for example, the deferred queue in Postfix. You can get data about the messages deferred. For nsq the same would be very useful as it would allow you to do some digging if you get a spike of deferred messages as opposed to merely knowing they exist.

So does this exist already and I've just been unable to find it?

mreiferson commented 7 years ago

@therealbill no, there is no way to inspect any of the messages in the queue.

I'll leave this issue open for discussion.

therealbill commented 7 years ago

@mreiferson thanks of confirming I've into merely been blind. ;)

Some items in particular I'd like to see on these would be, so maybe we can work through what might make a good feature-request:

I suppose a backhand way of doing it out of band might be logging these events, and that may be a decent first step. However, exposing it in the existing interface/API would be much better, especially if integrated with metrics logging such as statsd/graphite (and eventually prometheus). (On a related note, what actually happens to a message which has reached the maximum age and gets timed out?)

For a timed out message, a means to "retrieve" or re-queue it would be quite useful.

The operational use cases are:

  1. For deferred messages: tracking the performance/actions of consumers
  2. For messages that have timed out, being able to identity them and/or optionally re-queue them

I'm looking at it from an operations perspective, for which IMO NSQ already has some very nice features, and from previous experience handling messages in a different context where you could have deferred and (timed out) messages. The analogy with a mail queue breaks down in that the message's metadata is not as informative since you don't have an envelope. Nonetheless, being able to see the age of messages in a deferred state can be useful for earlier response time to a given channel/topic showing long deferrals and allow an ops person to take action prior to message timeout.

Further, having the message ID allows a producer to log the sending of that ID, and then being able to confirm a that ID was deferred or timed out is still useful for operations to get an idea of what may be breaking for under pressure - or when another business group asks why they haven't received an expected message(s). Further, a producer could, on their own, encode information into the message ID making that process easier.

ploxiln commented 7 years ago

When a consumer receives a message, it can determine how many times it has already been re-queued, and decide to throw it out by immediately calling .finish() on it. EDIT: for logic that simple, client libraries have a "max_attempts" option to do that.

For general discussion and questions you can use the google group https://groups.google.com/forum/#!forum/nsq-users

sundarv85 commented 6 years ago

@ploxiln although you are correct that we can handle it in the consumer part, it could be very helpful if that feature is available directly so that the consumer need not handle this by storing it separately into a file etc.

The reason I ask is, in our case we have 15 consumers and then we should implement a way to identify all the failures in each of these consumers separately and then debug the issue. If NSQ itself has such a feature, it could be great and we can continue with our debugging directly.