reapit / foundations

Foundations platform mono repo
56 stars 21 forks source link

Improve automatic alerting for webhook event handler dead letter queue #11069

Open plittlewood-rpt opened 2 months ago

plittlewood-rpt commented 2 months ago

Background context On 17th April there was an issue with the appointments API that caused the webhook event handler to fail to fetch appointments for a period of about an hour. These events correctly went to the dead letter queue, however the abnormality in events hitting this queue did not trip any alarms and so there was no automatic notification about the problem. We should look to improve this mechanism, and also consider the DLQ redrive policy to automatic replay failed events once any problems have been resolved. We should also consider as part of this whether or not we should bother to dead letter 404s

It is likely we'll need to look at the structure of the dead lettered message as we won't be able to just initiate a queue redrive

Specification

plittlewood-rpt commented 2 months ago

To arrange internal discussion to design the changes needed in more detail