opensafely-core / backend-server

Infrastructure code for managing partner hosted OpenSAFELY backend servers.
0 stars 2 forks source link

Backend /etc/hosts reset on reboot #232

Closed madwort closed 1 month ago

madwort commented 6 months ago

The TPP server was rebooted in https://github.com/opensafely-core/sysadmin/issues/168.

After the reboot the contents of /etc/hosts of the ubuntu VM were identical to the current contents of /etc/cloud/templates/hosts.debian.tmpl - which meant that it contained an entry for github-proxy.opensafely.org but not the expected ## opensafely core hosts block that we believe was there before the reboot. This meant that Airlock returned an error 500 on attempted login as it could not communicate with (firstly) collector.opensafely.org & (secondly) jobs.opensafely.org

Running just install in /srv/backend-server fixed the issue, but we assume this will happen again if the system is rebooted.

see also previous ticket and current state of install script

In particular this line suggests that /etc/hosts should not be overwritten (and is possibly also the default behaviour!?), so we should probably check out what's going wrong here.

madwort commented 6 months ago

Looking at /var/log/cloud-init.log and there's an entry at 2024-04-09 12:49:09 that says "Writing to /etc/hosts".

Two other similar-sounding situations (config var says don't update but it updates) - no solutions though. 1 2

madwort commented 6 months ago

We could try editing /etc/cloud/cloud.cfg to remove the hosts updater from the list. Or we could keep the template file up to date in our install script. Maybe to discuss with @bloodearnest when he's back from hols.

Given that I think we have only rebooted this twice in 18mo, I don't think it's hugely urgent.

madwort commented 6 months ago

side quest - this was mildly annoying when debugging the issue