jquery / infrastructure-puppet

Puppet configuration for jQuery Infrastructure servers.
MIT License
7 stars 9 forks source link

Upgrade from Debian 7 Wheezy (Puppet 3) to Debian 11 Bullseye (Puppet 7) #8

Closed Krinkle closed 5 months ago

Krinkle commented 3 years ago

List of hosts

https://github.com/jquery/infrastructure/blob/puppet-stage/manifests/site.pp

Dedicated tickets:

Overview

In order to get away from the very outdated Debian versions and such, we need to also get to a newer Puppet version.

We are currently using numerous Puppet 2 features that were deprecated in Puppet 3 and removed in Puppet 4. The main change that I think affects us is the change from "environment configs" to "environment directories".

Some relevant links:

Status quo: Puppet 3

The puppet server runs at puppet.ops.jquery.net (in legacy docs: puppet-master). The config for the server is at /etc/puppet/puppet.conf. There are two Git clones that we care about on this server:

In /etc/puppet/puppet.conf (the only place the Puppet server actually looks at) we have the following stuff:

[main]
# …
templatedir=$confdir/templates
manifest=/etc/puppet/manifests/site.pp

[stage]
manifest=/etc/puppet-stage/manifests/site.pp
modulepath=/etc/puppet-stage/modules
# …

[master]
# …

By default, with one of our droplets that runs a puppet agent asks for provisioning, it gets provisoned by the main config which points simply at the subdirectories within /etc/puppet. On staging hosts, we have another /etc/puppet/puppet.conf file that may contain environment = stage, which the agent passes on to the Puppet server, and so the Puppet server will consider that manifest and modulepath directory instead (in addition to compling it with $::environment = "stage").

Beyond this, the only other thing worth knowing is that we use jquery::postreceive instances (similar to for the content sites) to automatically update these git checkouts after commits to them. The actual applying of changes however is passive, based on puppet agents checking in with the server every 30 minutes (default Puppet agent behaviour).

Puppet 4

Under Puppet 4, things are a little bit different. There is no longer support for the templatedir, manifest, and modulepath parameters, and there is no longer support for per-environment configuration section overrides.

Instead, modules are read from a directory like /etc/puppet/code/environments/:environment/modules and manifests are read from a directory like /etc/puppet/code/environments/:environment/manifests. For example: /etc/puppet/code/environments/production/modules.

I think global templates are no longer supported, or at least not varying by environment. But that's okay, we only have one file in /templates and that'll either just not support staging or maybe we can even get rid of it (do we still use Zabbix?).

The new directory layout seems feasible, we just create two more clones and keep both for a little while.

Transition

I noticed just now that, apart from a few minor tweaks being needed for deprecated features, more generally it is not supported to connect Puppet 4 clients to a Puppet 3 server. However, the other way around is supported. So, the puppet master will have to go first, and that means a master switch, and setting up a new one of those first as well.

The good news is, a Puppet server is relatively easy to configure and gradually switch to...

Krinkle commented 3 years ago

Things I think we are not using, and I will omit initially in the Puppet 4 branch:

I'll mention this during this Friday's infra meeting (tomorrow) in case any we know of any of these definitely still being used.

/cc @mgol @brianwarner

mgol commented 3 years ago

I don't know much about our usage of any of these services with the exception of the fact that running:

sudo puppet agent --test

on jenkins-01 now results in the following output:

Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for jenkins-01.ops.jquery.net
Info: Applying configuration version '1616105347'
Notice: /Stage[main]/Main/Node[default]/Apt::Source[dotdeb]/Apt::Key[Add key: 3D624A3B from Apt::Source dotdeb]/Apt_key[Add key: 3D624A3B from Apt::Source dotdeb]/ensure: created
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libasound2 ' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libasound2
Error: /Stage[main]/Main/Node[jenkins]/Package[libasound2 ]/ensure: change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libasound2 ' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libasound2
Notice: /Stage[main]/Jquery::Newrelic/Exec[install newrelic license]/returns: executed successfully
Error: Could not start Service[newrelic-sysmond]: Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Wrapped exception:
Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Error: /Stage[main]/Jquery::Newrelic/Service[newrelic-sysmond]/ensure: change from stopped to running failed: Could not start Service[newrelic-sysmond]: Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Notice: /Stage[main]/Main/Node[jenkins-01.ops.jquery.net]/Jquery::Ssh::Host[jenkins]/Exec[chmod 0600 /etc/ssh/ssh_host*]/returns: executed successfully
Notice: /Stage[main]/Main/Node[jenkins-01.ops.jquery.net]/Jquery::Ssh::Host[jenkins]/Exec[chmod 0644 /etc/ssh/*.pub]/returns: executed successfully
Notice: Finished catalog run in 8.30 seconds

so it looks like New Relic is somehow interfering with Puppet runs. I'm not sure if those errors are blocking anything.

Krinkle commented 3 years ago

Aye, that looks familiar. I'll continue the jenkins-01 issue at https://github.com/jquery/infrastructure/issues/433.

Krinkle commented 3 years ago

This cleanup was useful and benefitted our current droplets as well.

But, the issue as originally written I'm closing for now per https://github.com/jquery/infrastructure/issues/482#issuecomment-907890935. Might pick it up again depending on whether if/when we're going to have droplets running newer Debian versions.

Krinkle commented 3 years ago

@atdt I've created the puppet-02.ops.jquery.net instance (IP: 104.131.63.112, DNS not yet assigned), with a Debian 11 image, a small 2-CPU / 4GB RAM plan, and both of our SSH keys attached for initial bootstrapping.

Empty repo for Puppet manifests: https://github.com/jquery/infrastructure-puppet. This is a new repo rather than a branch, so that we can manage most server configuration in public going forward. I suppose we can keep issue tracking and wiki pages in this repo for now. To be discussed at the infra meeting.

brianwarner commented 3 years ago

This is in place as well, also with proxying enabled until you tell me not to.

Krinkle commented 3 years ago

@brianwarner Aye, yeah, this one should be without proxying as it's for internal use such as shell access and receiving webhooks.

atdt commented 2 years ago

What version of Puppet should we target?

Puppet <= 5 has already reached EOL, and Puppet 6 is projected to reach EOL in less than a year. (See: Puppet platform lifecycle.

OTOH, Puppet 7 has not yet been packaged for Debian 11 Bullseye. Puppet Labs estimates packages for Debian 11 will be available within the next month.

Krinkle commented 2 years ago

@atdt I see. I suppose we could wait another month.

Alternatively, we could go with Puppet 7 now if we use Debian 10 Buster, I think? https://puppet.com/docs/puppet/7/server/install_from_packages.html

I don't see Debian 6 for Debian 11 Bullseye, but I'm not sure if I'm looking in the right place, is there one? I can't tell from the raw index at https://apt.puppet.com/. Either way, I imagine from one major Debian or Puppet to the next should be relatively simple with a few inline conditionals perhaps.

atdt commented 2 years ago

@Krinkle Puppet intends to release puppetserver 7.6.0 next week, with packages for Bullseye. Since it's so close, let's just wait.

atdt commented 2 years ago

OK, I installed puppetserver 7.6.0 on puppet-02. Here's what I ran:

#!/usr/bin/env bash
set -eux

# Enable the Puppet platform repository
# https://puppet.com/docs/puppet/7/install_puppet.html#enable_the_puppet_platform_repository
wget https://apt.puppet.com/puppet7-release-bullseye.deb
sudo dpkg -i puppet7-release-bullseye.deb

# Install Puppet server
apt install -y puppetserver
systemctl start puppetserver

# Install Puppet agent
apt install -y puppet-agent

# Start the Puppet service
/opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true

echo 'source /etc/profile.d/puppet-agent.sh' >> ~/.bashrc

/opt/puppetlabs/bin/puppet config set server puppet-02.ops.jquery.net --section main
/opt/puppetlabs/bin/puppet ssl bootstrap
Krinkle commented 2 years ago

I've created, in following with wiki: DNS and and wiki: Provisioning:

Both nyc3, 1 vCPU and 2 GB RAM, with Debian 11, and ori-2021 and krinkle-2020 for initial root. Also named as such in DNS via Cloudflare.

(Prod ones later expected as 2 vCPU / 4 GB RAM.)

Krinkle commented 2 years ago

@atdt I've followed your steps on both of the droplets (puppetserver only for puppet-02, puppet agent on both), with one minor change. The apt install -y puppetserver failed.

root@puppet-02:/tmp/provision# apt install -y puppetserver
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package puppetserver

root@puppet-02:/tmp/provision# apt install -y puppet<tab>
puppet                                      puppet-module-icann-quagga                  puppet-module-puppetlabs-mount-core
puppet-beaker                               puppet-module-icann-tea                     puppet-module-puppetlabs-mysql
puppet-lint                                 puppet-module-ironic                        puppet-module-puppetlabs-ntp
puppet-master                               puppet-module-joshuabaird-ipaclient         puppet-module-puppetlabs-postgresql
...
puppet-module-heat                          puppet-module-puppetlabs-host-core          puppet-strings
puppet-module-heini-wait-for                puppet-module-puppetlabs-inifile            puppet7-release
puppet-module-horizon                       puppet-module-puppetlabs-mongodb            

After an apt-get update it worked fine however, and indeed among its output is:

...
Get:13 http://apt.puppetlabs.com bullseye/puppet7 amd64 puppet-agent amd64 7.16.0-1bullseye [20.1 MB]
Get:14 http://deb.debian.org/debian bullseye/main amd64 fontconfig-config all 2.13.1-4.2 [281 kB]
...
Get:22 http://apt.puppetlabs.com bullseye/puppet7 amd64 puppetserver all 7.7.0-1bullseye [78.3 MB]
...

But then the service didn't want to start, because:

java.lang.Error: Not enough available RAM (1,982MB) to safely accommodate the configured JVM heap size of 1,979MB. Puppet Server requires at least 2,177MB of available RAM given this heap size,

So I've re-created it with 4GB instead of 2GB.

In addition, for the codeorigin-02 droplet one extra step as the puppet ssl bootstrap will pause on the regular droplets that are not the puppetserver until it is signed on the puppetserver:

Couldn't fetch certificate from CA server; you might still need to sign this agent's certificate (codeorigin-01.stage.ops.jquery.net).
Info: Will try again in 120 seconds.
...
...
...
Info: csr_attributes file loading from /etc/puppetlabs/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for codeorigin-01.stage.ops.jquery.net
Info: Certificate Request fingerprint (SHA256):  ....
Info: Downloaded certificate for codeorigin-01.stage.ops.jquery.net from https://puppet-02.stage.ops.jquery.net:8140/puppet-ca/v1
Notice: Completed SSL initialization

So I ran puppetserver ca list and then puppetserver ca sign --all. Mentiong here for future wiki page.

Krinkle commented 2 years ago

I've set up a basic skeleton at https://github.com/jquery/infrastructure-puppet for the puppet server, and provisioned as follows:

ssh root@puppet-02.stage.ops.jquery.net
$ cd /etc/puppetlabs/environments
$ rm -rf production/

$ apt-get install git
$ git clone https://github.com/jquery/infrastructure-puppet.git production/

This does not require a deployment ssh key, it can be an unauthenticated clone over HTTPS since this is the public puppet repository.

@atdt I originally wanted to set it up such that the public repo reflects /etc/puppetlabs/code rather than /etc/puppetlabs/code/environments/production/, but I couldn't get use of the modules directory to work. I had the following on the puppet server at /etc/puppetlabs/puppet/puppet.conf

[main]
disable_per_environment_manifest = true
default_manifest = /etc/puppetlabs/code/manifests
basemodulepath = /etc/puppetlabs/code/modules

But alas, it wasn't applying anything. There was no error though, it ran cleanly on codeorigin-01, but just didn't apply any roles. So the modulepath may've worked but that it was site.pp that was being ignored. Alas, in https://github.com/jquery/infrastructure-puppet/commit/ec2631b5b97a0554e35f8d1ad6e7c6b55bb700d6 I moved it all down a level and that's working now.

root@codeorigin-01:~# puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for codeorigin-01.stage.ops.jquery.net
Info: Applying configuration version '(491cd5b) Timo Tijhof - get_config: Follows-up ec2631b5b9'
Notice: /Stage[main]/Role::Codeorigin/Package[nginx]/ensure: created
Notice: Applied catalog in 9.08 seconds

Let me know if anything is off here, or could be better. Otherwise, next steps:

I think it'd be neat if the public repo is standalone and self-sufficient for staging and local use, e.g. not hard-require a third repo with pseudo secrets to be integrated to substitute for the real secret repo. Instead, we might be able to get away with only having private data come from Hieradata YAML files, which have a straight-forward inheritence chain that we can configure in production to include one extra layer from a checkout of the private repo.

In particular, the following non-trivial things seem best to provision via puppet instead of statically once:

atdt commented 2 years ago

I think it'd be neat if the public repo is standalone and self-sufficient for staging and local use, e.g. not hard-require a third repo with pseudo secrets to be integrated to substitute for the real secret repo. Instead, we might be able to get away with only having private data come from Hieradata YAML files, which have a straight-forward inheritence chain that we can configure in production to include one extra layer from a checkout of the private repo.

What would happen if a secret were to get accidentally deleted from the private repo? IIUC, with the setup that you're suggesting, we wouldn't get an error; the secret will simply quietly get the dummy value in production. I don't think we want that; we'd want Puppet to fail loudly in that case, which we'd get if we had a real and fake Puppet private repos.

Krinkle commented 2 years ago

@atdt Thanks, I hadn't thought of that!

I'd still like to try once more if we can avoid a third repo for fake-secrets, however. Rather than place all the dummy values in a "common" file in the puppet repo as I described before, what if we instead placed (most) of that in a "dummy" file that is still in the same puppet repo but indeed only optionally included, similarly to how we'd optionally include the private files. I believe that would effectively achieve the same, but with the file present in the repo rather than being brought in or symlinked from a separate repo, right?

atdt commented 2 years ago

@Krinkle I think that works, yes. Let's give it a shot.

Krinkle commented 2 years ago

While running puppet agent -tv works on clients and uses the correct server (puppet-02), the systemd service that was started originally kept failing as seen in syslog and via systemctl status puppet. Running systemctl restart puppet fixed that.

I'm gonna assume this is expected and simply because we ran puppet config set ... during the provisioning without restarting after that. Noting this here to be documented later as part of the provisioning steps.

Krinkle commented 2 years ago

I've provisioned myself and Ori on the new system, and also resolved https://github.com/jquery/infrastructure/issues/560 at the same time (Automatically remove unpuppetized root keys).

Krinkle commented 2 years ago

Notes from meeting with @atdt and myself:

Krinkle commented 1 year ago

Last remaining work:

The first one is blocked on https://github.com/jquery/infrastructure-puppet/issues/29

Krinkle commented 6 months ago

I've deleted the tarsnap backups of wp-01 using the command at https://github.com/jquery/infrastructure-puppet/issues/19#issuecomment-1699644372, and turned off the droplet. I'll delete it next week if nothing comes up by then.

Screenshot 2024-04-24 at 20 14 38

Just shy of its 10 year anniversary. Pretty good uptime!

timmywil commented 5 months ago

Decommissioning jenkins can be tracked specifically here: https://github.com/jquery/infrastructure-puppet/issues/47

After that, we can decommission puppet.ops.jquery.net and close this issue.

Krinkle commented 5 months ago

Timmy deleted the jenkins-01 droplet. I've now also deleted its DNS definition, and done the same for puppet.ops.jquery.net, which ticks all the boxes on this task!

puppet.ops.jquery.net. Debian 8.3. Created Feb 2016.