Closed kaihendry closed 5 years ago
Just noticed a simpler way of putting the DLQ messages back on the SQS from alambda. The we can make alambda to consume the SQS messages, hopefully without any interface changes.
This has the advantage of being simpler to implement than the aforementioned required changes, however this would still mean that MEFE will likely be overloaded.
I've configured in the dev environment alambda_simple's DLQ to relay to a SQS, which should then relay back to alambda_simple via the configured Lambda trigger.
Now this needs testing @franck-boullier
This change has rolled on to the dev environment. We need some more testing in dev. I was testing with old DLQ messages, but we probably now should move to "mass deassignment/reassignment via the UNTE interface". @franck-boullier
Furthermore need to create the queue configurations in the other environments.
dev/demo now have a queue as described upon https://blog.deleu.dev/leveraging-aws-sqs-retry-mechanism-lambda/
once prod is switched over, I'll close. Want to do some more testing over the weekend.
Currently our Microservice architecture looks like:
The problem is that if there is bulk action from the Enterprise API, lambda2sns may receive 100s of payloads via the lambda interface which it can by default concurrently.
This overwhelms MEFE in two ways: a. MEFE runs out of memory with such a high request count b. When MEFE triggers the lambda in step 4, the database can get blocked since there are so many incoming connections
MEFE currently fails by returning a 5xx which results in lamda2sns retrying the payload 3 times and then putting the message into a DLQ.
Introducing a SQS queue
A queue would solve our spikes and allow to specify longer gaps between retries
Right now we only have lambda concurrency limits and the default async retry behaviour to fall back on. This is too naive to cover all the cases.