upstash / issues

Issue Tracker for Upstash
https://upstash.com
2 stars 0 forks source link

Support retrieval of undelivered messages by topic and related features #100

Open thebrianbug opened 2 months ago

thebrianbug commented 2 months ago

Background

We are using qStash to process a large number of messages to various topics on a recurring basis. Each topic corresponds to a serverless function. Some of those functions are rate-limited, so we can only have so many running at the same time. To ensure we respect such limits, we need visibility to messages per topic that are not yet delivered but scheduled sometime in the future. To coordinate this, these features would be extremely helpful.

  1. A way to get undelivered messages by topic and view the next scheduled delivery time for each message.
    • This would satisfy our needs to make sure we are not scheduling new messages before the last ones are completed. Due to potential retries and network failures, we are unable to know 100% if a topic has undelivered messages or not at any given time unless qStash can tell users of its current state.
    • If we had this, we could also easily loop through all messages by topic and delete/cancel them if ever needed due to system errors, as explained below.
  2. Less sophisticated than the full list of of undelivered messages by topic, it would also be useful to be able to get a simple count of the number of undelivered messages per topic. If this count was zero, we would know it was safe to schedule additional messages to that topic.
  3. A way to cancel all undelivered messages by topic. Currently there seems to be no good way to do this, even if we delete the topic. Sometimes things go wrong. A kill-all switch is essential for recovery from some errors. Currently, our workaround is to change the topic endpoint to some URL that we don't care about, but this is not a very satisfying solution and we still have to wait for delayed messages to clear before restoring the real endpoints.

Another potential solution could be to swap to using Upstash Queues. However, we can't yet do that without these features being added to Queues

  1. Add a delay feature to the Queue between batches to ensure that we comply with 3rd party rate limits. When combined with the Controlled Parallelism feature, this would be enough to satisfy our use case.

I hope at least some of these features could be added as they would make qStash a much more robust message delivery system!

Note: Every mention of "undelivered messages" I mean qStash messages that are scheduled to be delivered in the future, plus retries of old messages. I am not referring to messages in the Dead Letter Queue.

sancar commented 2 months ago

Hi @thebrianbug

For use-cases around rate-limit, we definitely focus on the Queues. The infra around the publish is not really suitable for this kind of job. Adding too many API's for rate limiting to publish will be both complicated and also the end result will probably use too much resources.

Sharing in case if you missed it the Queues have a lag field that you can utilize as you described in item 2. See https://upstash.com/docs/qstash/api/queues/get

Regarding item 3. We are planning to have a detailed filtering option on events and also ability to cancel multiple messages(with probably array of message ids). With the end result, you should be able to filter all messages going to same queue/topic/scheduled id and cancel them. Actually filtering is already implemented(not documented) but when reading your message, I noticed some usability problems. Right now you can do

curl https://qstash.upstash.io/v2/events  -H "Authorization:....." -d '{"state" : "DELIVERED" , "topicName": "myTopic"}'

This does not allow to get the message id's with undelivered ones. We will work on it a little bit more.

We also have this rate (as message/sec) idea for a queue. Sounds similar to 4th item. When configured a queue, it will not try to deliver more messages but it will wait a bit to match to the given rate.

To sump up, and verify that it make sense. If we put the rate feature to the Queue, would that be enough for your use-case without implementing get messages by queue/topic. If you still need to introspect "undelivered messages" , why would that be ?

And even without this rate feature, with just utilizing lag field, can you achieve your use case ? I realize that it will be a bit more work on your side to implement some sort of a back off but would you need "the get undelivered message by queue" API ?