sanger / General-Backlog-Items

Broad bucket to collate backlog items that have no obvious repository
0 stars 0 forks source link

Y24-355 - As a team we would like to minimise the dependencies of Monit.d so that it is less likely to be impacted by service outages. #454

Open TWJW-SANGER opened 1 month ago

TWJW-SANGER commented 1 month ago

User story As a team we would like to minimise the dependencies of Monit.d so that it is less likely to be impacted by service outages.

Who are the primary contacts for this story TW, PSD

Who is the nominated tester for UAT This is to be tested by the PSD team

Acceptance criteria To be considered successful the solution must allow:

Additional context When the MySQL databases were being taken offline to be patched we think we discovered that Monit has a dependency on a MySQL database provided by the DBA team. Ideally our monitoring solution would have no dependencies in common with the applications being monitored - otherwise we risk an outage on one of those services impacting the applications AND the system that alerts us to problems. In practice, Monit will need to depend on OpenStack, its instances, images and networking (which are monitored by other teams and have a large impact if they go down beyond just PSD).

BenTopping commented 1 week ago

I think we could have avoided the MySQL outage affecting M/Monit as much as it did if we turned off the other instances earlier and had a clear understanding about what processes needed turning off. Its also worth noting we purposefully moved off sqlite to MySQL during the eta -> theta openstack migration 4 years ago. The given reasoning is 'it will eventually break'. M/Monit themselves have a guide on migrating off sqlite and suggest: M/Monit comes bundled and configured with SQLite as its database system. No extra setup is required. If you plan to use M/Monit to monitor more than, say 40-50 hosts, you may want to use MySQL or PostgreSQL instead as these database systems are faster and scale much better. If in doubt, start with SQLite.

I think there are a couple options here:

Feels like a symptom of the way we turn off instances

BenTopping commented 1 day ago

Team meeting 15/11/24

Two separate use cases for M/Monit:

  1. Monitoring
    • The monitoring solution should not be coupled to any services that the apps being monitored are also coupled to. In this case MySQL.
  2. Shutdown startup process
    • M/Monit is not the only way to control shutdown and start up of instances but it is a useful tool.
    • M/Monit is the current main method for PSD so it would be beneficial to not have it coupled to the same dependencies as the apps being shutdown.

Team agreed the preferred solution is to have a self hosted MySQL instance.

  1. We don't need to worry too much about updates / management because it is an isolated instance so not a security concern.
  2. Current production MySQL db is 56MB so easy to backup and dump.
  3. Benefits of a more performant / scaleable db compared to sqlite and easier to transfer existing data across.
  4. Useful because it completely removes any dependenices of M/Monit.

Next steps: