xmidt-org / ears

Event Async Receiver Service (EARS)
Apache License 2.0
4 stars 7 forks source link

Messages are not delivered intermittently in EARS functional test #153

Open holidaymike opened 3 years ago

holidaymike commented 3 years ago

When running ears functional test, we sometimes, find that the test messages are timing out in EARS and are not getting delivered with the following error:

{"log.level":"error","op":"SQS.receiveWorker","workerNum":0,"time":1625675966,"message":"max retries reached for 6c2d4bc5-d6c1-418b-b023-0cd5a5bad423"}

Need to investigate

holidaymike commented 3 years ago

This happens when we run the same functional test in quick succession, and there is a bug? in SQS receiver that when we call StopReceving on the receiver, the receiver may take a while to stop because it is blocked on this line: https://github.com/xmidt-org/ears/blob/main/pkg/plugins/sqs/receiver.go#L160, and it may be able to receive messages from the next test. When this happen, the approximateReceiveCount for a test message will increase, and when the route from the new test gets the message, the count is already set to 1 causing the new receiver to drop the message with the error:

{"log.level":"error","op":"SQS.receiveWorker","workerNum":0,"time":1625675966,"message":"max retries reached for 6c2d4bc5-d6c1-418b-b023-0cd5a5bad423"}

Instead of receiving SQS message using svc.ReceiveMessage(...), we should really use svc.ReceiveMessageWithContext(...) so that we can break out of the call immediately when the SQS receiver needs to stop