planetary-social / ansible-scripts

Ansible automation scripts used at Planetary
MIT License
2 stars 3 forks source link

Add admin password to sentry inventory #53

Closed mplorentz closed 8 months ago

mplorentz commented 9 months ago

The sentry.nos.social SSL cert didn't renew properly, and I tried my hand at fixing it this morning. I think the problem was that I ran the scripts without CLOUDFLARE_API_TOKEN set in my environment (or is it supposed to be in the vault now?).

I set that and tried to rerun the scripts, but ran into this error:

TASK [common : Perform a dist-upgrade.] *****************************************************************************************************************************************************************************
fatal: [sentry.nos.social]: FAILED! => {"msg": "Missing sudo password"}

I realized that I set this droplet up before new-do-droplet existed, so it didn't get quite the same setup that common and other roles expect. I tried to mimic the new-do-droplet role by creating an authorized_keys file for admin with our keys, adding admin to the sudoers file, setting a password for the admin user and adding it to the ansible vault and to inventories/sentry/inventory.yml. However running the playbook still gives me the error, so rather than spinning my wheels further I will wait for @cooldracula to tell me what is wrong.

Related to https://github.com/planetary-social/infrastructure/issues/73

cooldracula commented 9 months ago

I looked into the certificate and there were a couple things interfering with the renewal.
1.) The API key was not being accepted by cloudflare. There was either a rollover of this key since it was added perhaps? I updated the cloudflare config with the key I use in our other deployments and it renewed successfully. 2.) The renewal was missing a hook that restarts nginx, so the updated cert comes into effect. This is part of our certbot role, but may have been added after sentry was deployed. I added the hook to this deployment so it should renew automatically and successfully in the future.

Sentry is a tough one i relation to our ansible scripts. It is similar to posthog where it has it's own deployment strategy and it is safer to just cede to it. However, it's additionally hard because the deployment takes so long. We have a sentry role, but I found it's not practical in real cases due to how long the deployment step takes--it is longer than a standard ssh connection, and so our ansible role (even with a number of async polling steps added) would be flaky and fail, wrongly thinking the installation on the server had stopped. I find with sentry it is easier to go to the server and run their steps, though it makes the server more of a snowflake. I am not certain the best option yet.

For the immediate, the certificate is renewed and should renew successfully in the future. For the ansible script, the sentry deployment is an edge case without a great experience in ansible, and I wonder if its role/playbook should be deprecated?

mplorentz commented 8 months ago

I would be fine with removing the role/playbook or maybe leaving the playbook but just making it documentation of how to run the script manually and not a real playbook.