Investigate mechanisms for auto-deploying services on our VMs

PeterJCLaw commented 3 years ago

From https://github.com/srobo/infrastructure-team-minutes/blob/master/2021/2021-11-17.md#specific

RealOrangeOne commented 3 years ago

I think the pattern of each project building its own container, and deploying it to GitHub's container registry is the best course of action.

I see a few options for deployment. In either case, the deployment configuration itself (ie docker-compose.yml) would likely live separately, and changes to that would need to be coordinated to be deployed at the same time (potentially manually if needed)

Using some tool, repositories would send a webhook to a server, which would trigger a container pull and restart of the application. This could either be a tool like webhook or something more custom. The downside is that this would have to be configured per repo, and would only install updates when the repo changed (or when a cron action is run).
Using cron, we can pull updates to all images on a schedule. This way updates will always be applied, without any authentication required on the repository. The downside of this is that updates may take time to apply (even though updating manually is also very simple). example update script.

I suspect for our simple needs, option 2 would be better, so long as we get the schedule right. If something needs deploying urgently, chances are someone will be around anyway. This way would also ensure containers we don't maintain (eg databases) would be updated at the same time. It also requires the smallest amount of configuration per repo, meaning provisioning new applications is done entirely within config management.

PeterJCLaw commented 3 years ago

Using cron, we can pull updates to all images on a schedule.

Presumably there's a way to ensure that we don't restart a service which hasn't actually updated?

If something needs deploying urgently, chances are someone will be around anyway.

I don't like the idea of building our system relying on this.

This way would also ensure containers we don't maintain (eg databases) would be updated at the same time.

I'm not sure I like the idea of a system which auto-updates itself implicitly at a time when people may not be around. We did have something like this with the puppet setup for a while (with it auto-updating services at 7:30am each day) and it kinda worked. We were very careful that the services were either pinned revisions or well maintained repos to try to avoid accidental breakages, but still did have some cases where changes went live that broke the service.

It also requires the smallest amount of configuration per repo, meaning provisioning new applications is done entirely within config management.

Since the repo which is building the image still needs to ensure that it has a rebuild cron to pick up updates it's not clear to me that this saves much?

PeterJCLaw commented 3 years ago

Using some tool, repositories would send a webhook to a server, which would trigger a container pull and restart of the application. The downside is that this would have to be configured per repo, and would only install updates when the repo changed (or when a cron action is run).

How else would updates be installed? As I understand it, the way to get updates into a docker image is to rebuild the image. Since it's the repo that's responsible for building it, the default state is that the image doesn't change.

Have you considered what a custom GitHub Action for handling the updates cron & webhook might look like? As I understand it we can package pretty much arbitrary stuff into an Action, so could make this pretty simple -- the repo would just include a boilerplate workflow config which ran both on a (perhaps daily) cron for updates and on changes to the main branch.

PeterJCLaw commented 3 years ago

For each scenario, could you expand a bit on how volunteers would know which version of their service is actually live, view its logs, etc? (In general: how do they know that a new version has been deployed, how do they recover from a startup failure or crash?)

RealOrangeOne commented 2 years ago

Presumably there's a way to ensure that we don't restart a service which hasn't actually updated?

Yes absolutely.

Since the repo which is building the image still needs to ensure that it has a rebuild cron to pick up updates it's not clear to me that this saves much?

It doesn't need it. OS updates will happen with updates to the code, besides that, they won't. That's probably fine.

How else would updates be installed?

We'll likely be running containers not developed by us, eg a database. In this case it would mean the database wouldn't see an update until the application did.

For each scenario, could you expand a bit on how volunteers would know which version of their service is actually live

If it's in the default branch, it's either live or deploying.

view its logs, etc

In the current setup, they couldn't. Access to the logs requires shell access. This includes the current container, and the new container in the event of a deploy.

This is why tools like Heroku are so popular. It's difficult. Webhooks to a server to trigger a deploy, and some UI like Portainer to give non-privileged access to the server to see logs may be as close as we can reasonably get without writing something ourselves.

PeterJCLaw commented 2 years ago

Since the repo which is building the image still needs to ensure that it has a rebuild cron to pick up updates it's not clear to me that this saves much?

It doesn't need it. OS updates will happen with updates to the code, besides that, they won't. That's probably fine.

Given the relatively low frequency that many of our services actually change (both currently and for the foreseeable future and likely in the ideal state too) I think it's quite important that we not rely on this.

How else would updates be installed?

We'll likely be running containers not developed by us, eg a database. In this case it would mean the database wouldn't see an update until the application did.

Yeees, though in my understanding it's typical of image distributors to ship image updates without shipping application updates precisely because of this issue.

For each scenario, could you expand a bit on how volunteers would know which version of their service is actually live

If it's in the default branch, it's either live or deploying.

Or the deployment errored in some manner, which is where it gets interesting. Presumably volunteers will all need to see the deployment state and the related logs to know what failed and why?

view its logs, etc

In the current setup, they couldn't. Access to the logs requires shell access. This includes the current container, and the new container in the event of a deploy.

Being able to see the logs of the service is pretty important. I think we should consider this a first class concern.

PeterJCLaw commented 2 years ago

Closing as this hasn't proved to be needed. If doing things manually becomes too much faff (or someone especially wants to work on this) we can reopen this task.

srobo / infrastructure-team-minutes

Investigate mechanisms for auto-deploying services on our VMs #10