This is a fully HTTP based Pub/Sub Broker with a goal to simplify system architecture in SOA or Microservice architecture. It aims to solve the inter service communication problem.
While Webhook Broker has been running in production, one of the main issues that effects webhook broker is the data volume that is in the DB. As messages and jobs are never deleted, these 2 tables keeps getting very large. At one stage sieging to operate as DB queries for failure jobs become extremely less responsive.
Solution Approach
While we want to prune, we also want to retain the messages for archival purpose just in case we need it. So what we are doing is, there is a retention period to keep data in the database; by default its set to 0 days, which means pruning is disabled; recommendation is for it to be set to 3~7; so any messages older than 7 days and if all of its jobs has been delivered, that is, there is no job in DLQ for that message, then it will be selected for deletion. The message and its jobs will be serialized in a single JSON-Line to 2 files - one in local directory and one remote (test purpose remote can also be file:/// and it is optional). The files will also be rotated if they reach file limit, default size is 100MB. Configuration template discusses the [prune] section in more details.
To make sure pruning and broker does not run at the same time, we have introduced sub-commands. Default command is to run the broker to keep it backward compatible; to run prune mode just add -command prune.
Some implementation quirks (intentional)
Deletion of messages and its jobs are not batched; it is done so intentionally to keep the unit write volume low
Remote URL and path prefix are broken down 2 variables since gocloud.dev/blob accepts connection params via the URL that might interfere with path prefix or how the blob name is created
The dump file name is obnoxious long
Other changes made
Go version is upgraded to 1.23
Some changes made to at least go build to pass in Windows to make development some what feasible, still test execution is a nightmare
Dependencies versions upgraded
Some minor non-consequential refactorings
Next changes
Work on integration test to incorporate pruning
Helm Chart needs to be updated to include the Job creation for pruning
Auto Pruning of Webhook Broker Database.
Problem Statement
While Webhook Broker has been running in production, one of the main issues that effects webhook broker is the data volume that is in the DB. As messages and jobs are never deleted, these 2 tables keeps getting very large. At one stage sieging to operate as DB queries for failure jobs become extremely less responsive.
Solution Approach
While we want to prune, we also want to retain the messages for archival purpose just in case we need it. So what we are doing is, there is a retention period to keep data in the database; by default its set to 0 days, which means pruning is disabled; recommendation is for it to be set to 3~7; so any messages older than 7 days and if all of its jobs has been delivered, that is, there is no job in DLQ for that message, then it will be selected for deletion. The message and its jobs will be serialized in a single JSON-Line to 2 files - one in local directory and one remote (test purpose remote can also be
file:///
and it is optional). The files will also be rotated if they reach file limit, default size is 100MB. Configuration template discusses the[prune]
section in more details.To make sure pruning and broker does not run at the same time, we have introduced sub-commands. Default command is to run the broker to keep it backward compatible; to run prune mode just add
-command prune
.Some implementation quirks (intentional)
Other changes made
go build
to pass in Windows to make development some what feasible, still test execution is a nightmareNext changes