ruby-shoryuken / shoryuken

A super efficient Amazon SQS thread based message processor for Ruby. This project is in MAINTENANCE MODE. Reach me out on Slack link on the description if you want to become a new maintainer.
Other
2.06k stars 280 forks source link

Batches greater than 10 #738

Closed sinecode closed 1 year ago

sinecode commented 1 year ago

Hello!

First of all, thank you for providing this excellent gem!

This pull request introduces a feature that allows batches with a size greater than 10. To achieve this, I have added an additional worker parameter called batch_options. With batch_options, users can specify the desired maximum batch size and batch timeout. The batch timeout ensures that we don't have to wait indefinitely to reach the desired batch size.

To handle the scenario where multiple calls are made to SQS in a single fetch, I have introduced a check on the queue visibility timeout. If the queue visibility timeout is lower than the batch timeout, a warning message is logged. This prevents the situation where fetched but unprocessed messages are returned to the queue.

Here's an example of how to use the new feature:

shoryuken_options(
   queue: 'queue_name',
   batch: true,
   batch_options: {
     max_size: 15,
     timeout: 100  # in seconds
   }
)

Currently, I am already running several workers that process batches greater than 10 with this algorithm using a custom framework outside of Shoryuken. However, I believe that including this feature within the gem would be beneficial, as I use Shoryuken for all other workers that don't require batches greater than 10.

I would greatly appreciate your feedback on this proposal. Please let me know if there are any considerations or conflicts with existing Shoryuken features that I may have overlooked. If everything seems acceptable, I am more than willing to extensively test this feature in real-world code.

phstc commented 1 year ago

Hi @sinecode

Thanks for your contribution. The code looks solid.

Shoryuken tries to be as transparent as possible with SQS. I'm not sure if increasing the batch size is a fit for Shoryuken.

The visibility timeout worries me. I understand you add some control around that, but still, it's not something we have in SQS.

Could you give a more specific example of your use case? I understand that we may need to process more than 10 at a time. But I'm wondering if there are other solutions instead of increasing the batch limit.

sinecode commented 1 year ago

Hi @sinecode

Thanks for your contribution. The code looks solid.

Shoryuken tries to be as transparent as possible with SQS. I'm not sure if increasing the batch size is a fit for Shoryuken.

The visibility timeout worries me. I understand you add some control around that, but still, it's not something we have in SQS.

Could you give a more specific example of your use case? I understand that we may need to process more than 10 at a time. But I'm wondering if there are other solutions instead of increasing the batch limit.

Hi @phstc, thanks a lot for your feedback.

In order to optimize performance and minimize the number of API calls, I usually require batches consisting of more than 10 messages, ideally around 50-150 messages. Working with smaller batches often results in excessive API calls or underutilization of batch processing capabilities (think of GPUs).

Alternatively, one might consider storing the messages in a secondary data structure such as Redis. However, this approach introduces challenges in managing the lifecycle of the messages. It would require handling both the SQS queue and the additional data structure simultaneously. Therefore, I think that we can directly use only the SQS queue as support.

I totally get your point. If you believe that this approach deviates too much from SQS, please feel free to close this PR. I really appreciated your feedback anyway.

github-actions[bot] commented 1 year ago

This PR is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon.

github-actions[bot] commented 1 year ago

This PR was closed because it hasn't seen activity for a while.