triggermesh / aws-event-sources

Knative event sources for AWS services
Apache License 2.0
60 stars 14 forks source link

Allow customization of SQSSource visibility timeout #329

Closed noahw3 closed 3 years ago

noahw3 commented 3 years ago

Currently when reading messages from SQS the visibility timeout is set at a static 30 seconds. However, depending on the type of sink configured this may be insufficient.

I have a long-running service as the sink in my configuration. The goal is to maintain SQS as the source of truth for whether or not a message has been processed, not deleting the message from the queue until the service returns successfully. This works today with the exception of the visibility timeout - it ends up in an infinite retry loop because the response from the sink takes >30 seconds even though each individual request is successful.

Ideally the visibility timeout should be configurable from the spec.

antoineco commented 3 years ago

Hey @noahw3, that's a good suggestion :+1:

How would you feel about us adding the following nested attribute to the spec?:

spec:
  receiveOptions:
    visibilityTimeout: <duration string; e.g. "1m30s">

(Ultimately, we could add more configurable options under receiveOptions if folks need them.)

Comments/suggestions welcome!

noahw3 commented 3 years ago

That seems reasonable. I'm not sure what the convention is in this project, but visibilityTimeoutSeconds, with a number instead of the duration string, might also make sense (it also mirrors what the backing API looks like).

I can't think of any reason why someone would want to change the batch size or polling time. There could possibly be a use case for adjusting the receiveMsgPeriod?

One thing that I did notice is that this source hits SQS fairly hard. The official knative SQS source records ~20 empty receives/minute and AWS Lambda configured with SQS does about 15. Meanwhile this source clocks in at around 50 (it's oddly spiky too). While not an immediate problem, I could see large organizations with many of these processors running into SQS limits and throttling much sooner than alternatives.

Another thing I noticed is that custom SQS message attributes aren't read or included in the downsteam CloudEvent. I believe this is the MessageAttributeNames parameter? It looks like there's a way to just ask for all of them, similarly to AttributeNames. Might be worth including by default? I could also see an argument for allowing both of these fields to be specified if a user only wanted a subset of them included.

antoineco commented 3 years ago

visibilityTimeoutSeconds, with a number instead of the duration string, might also make sense (it also mirrors what the backing API looks like).

There is no strict convention, but we tend to adjust our APIs to whatever makes sense in the context, instead of just exposing raw AWS APIs as Kubernetes objects. That means converting a duration to seconds (and ensuring there is a sane lower bound) should be reasonable here. We did this in the CloudWatch source, among others.

There could possibly be a use case for adjusting the receiveMsgPeriod?

Maybe, indeed.

One thing that I did notice is that this source hits SQS fairly hard. The official knative SQS source records ~20 empty receives/minute and AWS Lambda configured with SQS does about 15. Meanwhile this source clocks in at around 50 (it's oddly spiky too). While not an immediate problem, I could see large organizations with many of these processors running into SQS limits and throttling much sooner than alternatives.

That is true. The number of workers depends on the number of CPUs available, and in most cases workers should be kept inactive until a spike occurs. This is a known issue we want to tackle in triggermesh/triggermesh#39.

Another thing I noticed is that custom SQS message attributes aren't read or included in the downsteam CloudEvent. I believe this is the MessageAttributeNames parameter? It looks like there's a way to just ask for all of them, similarly to AttributeNames. Might be worth including by default?

That sounds fair! An option to configure this would be useful 👍 I just created #330