google / go-cloud

The Go Cloud Development Kit (Go CDK): A library and tools for open cloud development in Go.
https://gocloud.dev/
Apache License 2.0
9.45k stars 799 forks source link

pubsub/awssnssqs: aws sqs expose receiver max batch #3412

Closed pxp928 closed 1 month ago

pxp928 commented 3 months ago

This PR exposes the receiver max batch size as a URL parameter for AWS SQS via receivermaxbatch.

For example: awssqs://sqs.us-east-2.amazonaws.com/99999/my-queue?receivermaxbatch=5

Based on the recvBatcherOpts: https://github.com/google/go-cloud/blob/be1b4aee38955e1b8cd1c46f8f47fb6f9d820a9b/pubsub/awssnssqs/awssnssqs.go#L118-L123 and the limitations of SQS, any value above 10 would default back to 10.

vangent commented 3 months ago

Out of curiousity, why do you need to set this?

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.14%. Comparing base (be1b4ae) to head (af19c89).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #3412 +/- ## ========================================== + Coverage 73.12% 73.14% +0.01% ========================================== Files 113 113 Lines 14864 14870 +6 ========================================== + Hits 10870 10876 +6 Misses 3219 3219 Partials 775 775 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

pxp928 commented 3 months ago

Hey @vangent, the use case is to allow for flexibility to change the max batch size if needed. For example, having a smaller batch size could result in us being able to scale out our process service so that there are fewer messages "in-flight" (SQS visibility timeout). So the process service can be scaled in or out as needed based on the configured batch size that each process service can handle.

vangent commented 3 months ago

Have you tried doing that without manually tuning it? The pubsub package will not always use the maximum batch size, it will tune it to try to keep a balanced throughput.

So, for example, if you only have 2 worker goroutines processing messages, it is unlikely that it will be fetching 10 messages at a time and letting them sit there for a long time (unless the processing time is very fast, in which you do want more messages to be queued so that the workers aren't idle).

Basically, the package does a lot to try to make it so that you don't have to manually tune this, that's one of the benefits. If you are manually tuning it because what I've described isn't working the way you want for some reason, that's one thing, but I don't want you to add complexity manaully tuning something that shouldn't need it.

pxp928 commented 3 months ago

oh, interesting I did not see that in https://gocloud.dev/howto/pubsub/subscribe/. Is there any documentation around this behavior? Is there a way to know what the current batch size is set to and what it changes to via logs? Thank You for pointing this out.

vangent commented 3 months ago

I don't think it's well-documented, as it is not really part of the public interface, it's internal implementation detail.

You can see constants for the algorithm here: https://github.com/google/go-cloud/blob/master/pubsub/pubsub.go#L397

and the main code is here: https://github.com/google/go-cloud/blob/master/pubsub/pubsub.go#L462

No, the batch size isn't currently logged, but you can patch a local copy and add some logging.

pxp928 commented 1 month ago

Thanks @vangent for the information!