Closed Krinkle closed 5 months ago
Things I think we are not using, and I will omit initially in the Puppet 4 branch:
I'll mention this during this Friday's infra meeting (tomorrow) in case any we know of any of these definitely still being used.
/cc @mgol @brianwarner
I don't know much about our usage of any of these services with the exception of the fact that running:
sudo puppet agent --test
on jenkins-01
now results in the following output:
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for jenkins-01.ops.jquery.net
Info: Applying configuration version '1616105347'
Notice: /Stage[main]/Main/Node[default]/Apt::Source[dotdeb]/Apt::Key[Add key: 3D624A3B from Apt::Source dotdeb]/Apt_key[Add key: 3D624A3B from Apt::Source dotdeb]/ensure: created
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libasound2 ' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libasound2
Error: /Stage[main]/Main/Node[jenkins]/Package[libasound2 ]/ensure: change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libasound2 ' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libasound2
Notice: /Stage[main]/Jquery::Newrelic/Exec[install newrelic license]/returns: executed successfully
Error: Could not start Service[newrelic-sysmond]: Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Wrapped exception:
Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Error: /Stage[main]/Jquery::Newrelic/Service[newrelic-sysmond]/ensure: change from stopped to running failed: Could not start Service[newrelic-sysmond]: Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Notice: /Stage[main]/Main/Node[jenkins-01.ops.jquery.net]/Jquery::Ssh::Host[jenkins]/Exec[chmod 0600 /etc/ssh/ssh_host*]/returns: executed successfully
Notice: /Stage[main]/Main/Node[jenkins-01.ops.jquery.net]/Jquery::Ssh::Host[jenkins]/Exec[chmod 0644 /etc/ssh/*.pub]/returns: executed successfully
Notice: Finished catalog run in 8.30 seconds
so it looks like New Relic is somehow interfering with Puppet runs. I'm not sure if those errors are blocking anything.
Aye, that looks familiar. I'll continue the jenkins-01 issue at https://github.com/jquery/infrastructure/issues/433.
This cleanup was useful and benefitted our current droplets as well.
But, the issue as originally written I'm closing for now per https://github.com/jquery/infrastructure/issues/482#issuecomment-907890935. Might pick it up again depending on whether if/when we're going to have droplets running newer Debian versions.
@atdt I've created the puppet-02.ops.jquery.net instance (IP: 104.131.63.112, DNS not yet assigned), with a Debian 11 image, a small 2-CPU / 4GB RAM plan, and both of our SSH keys attached for initial bootstrapping.
Empty repo for Puppet manifests: https://github.com/jquery/infrastructure-puppet. This is a new repo rather than a branch, so that we can manage most server configuration in public going forward. I suppose we can keep issue tracking and wiki pages in this repo for now. To be discussed at the infra meeting.
This is in place as well, also with proxying enabled until you tell me not to.
@brianwarner Aye, yeah, this one should be without proxying as it's for internal use such as shell access and receiving webhooks.
What version of Puppet should we target?
Puppet <= 5 has already reached EOL, and Puppet 6 is projected to reach EOL in less than a year. (See: Puppet platform lifecycle.
OTOH, Puppet 7 has not yet been packaged for Debian 11 Bullseye. Puppet Labs estimates packages for Debian 11 will be available within the next month.
@atdt I see. I suppose we could wait another month.
Alternatively, we could go with Puppet 7 now if we use Debian 10 Buster, I think? https://puppet.com/docs/puppet/7/server/install_from_packages.html
I don't see Debian 6 for Debian 11 Bullseye, but I'm not sure if I'm looking in the right place, is there one? I can't tell from the raw index at https://apt.puppet.com/. Either way, I imagine from one major Debian or Puppet to the next should be relatively simple with a few inline conditionals perhaps.
@Krinkle Puppet intends to release puppetserver 7.6.0 next week, with packages for Bullseye. Since it's so close, let's just wait.
OK, I installed puppetserver 7.6.0 on puppet-02. Here's what I ran:
#!/usr/bin/env bash
set -eux
# Enable the Puppet platform repository
# https://puppet.com/docs/puppet/7/install_puppet.html#enable_the_puppet_platform_repository
wget https://apt.puppet.com/puppet7-release-bullseye.deb
sudo dpkg -i puppet7-release-bullseye.deb
# Install Puppet server
apt install -y puppetserver
systemctl start puppetserver
# Install Puppet agent
apt install -y puppet-agent
# Start the Puppet service
/opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true
echo 'source /etc/profile.d/puppet-agent.sh' >> ~/.bashrc
/opt/puppetlabs/bin/puppet config set server puppet-02.ops.jquery.net --section main
/opt/puppetlabs/bin/puppet ssl bootstrap
I've created, in following with wiki: DNS and and wiki: Provisioning:
puppet-02.stage.ops.jquery.net
codeorigin-01.stage.ops.jquery.net
Both nyc3, 1 vCPU and 2 GB RAM, with Debian 11, and ori-2021 and krinkle-2020 for initial root. Also named as such in DNS via Cloudflare.
(Prod ones later expected as 2 vCPU / 4 GB RAM.)
@atdt I've followed your steps on both of the droplets (puppetserver only for puppet-02, puppet agent on both), with one minor change. The apt install -y puppetserver
failed.
root@puppet-02:/tmp/provision# apt install -y puppetserver
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package puppetserver
root@puppet-02:/tmp/provision# apt install -y puppet<tab>
puppet puppet-module-icann-quagga puppet-module-puppetlabs-mount-core
puppet-beaker puppet-module-icann-tea puppet-module-puppetlabs-mysql
puppet-lint puppet-module-ironic puppet-module-puppetlabs-ntp
puppet-master puppet-module-joshuabaird-ipaclient puppet-module-puppetlabs-postgresql
...
puppet-module-heat puppet-module-puppetlabs-host-core puppet-strings
puppet-module-heini-wait-for puppet-module-puppetlabs-inifile puppet7-release
puppet-module-horizon puppet-module-puppetlabs-mongodb
After an apt-get update
it worked fine however, and indeed among its output is:
...
Get:13 http://apt.puppetlabs.com bullseye/puppet7 amd64 puppet-agent amd64 7.16.0-1bullseye [20.1 MB]
Get:14 http://deb.debian.org/debian bullseye/main amd64 fontconfig-config all 2.13.1-4.2 [281 kB]
...
Get:22 http://apt.puppetlabs.com bullseye/puppet7 amd64 puppetserver all 7.7.0-1bullseye [78.3 MB]
...
But then the service didn't want to start, because:
java.lang.Error: Not enough available RAM (1,982MB) to safely accommodate the configured JVM heap size of 1,979MB. Puppet Server requires at least 2,177MB of available RAM given this heap size,
So I've re-created it with 4GB instead of 2GB.
In addition, for the codeorigin-02 droplet one extra step as the puppet ssl bootstrap
will pause on the regular droplets that are not the puppetserver until it is signed on the puppetserver:
Couldn't fetch certificate from CA server; you might still need to sign this agent's certificate (codeorigin-01.stage.ops.jquery.net).
Info: Will try again in 120 seconds.
...
...
...
Info: csr_attributes file loading from /etc/puppetlabs/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for codeorigin-01.stage.ops.jquery.net
Info: Certificate Request fingerprint (SHA256): ....
Info: Downloaded certificate for codeorigin-01.stage.ops.jquery.net from https://puppet-02.stage.ops.jquery.net:8140/puppet-ca/v1
Notice: Completed SSL initialization
So I ran puppetserver ca list
and then puppetserver ca sign --all
. Mentiong here for future wiki page.
I've set up a basic skeleton at https://github.com/jquery/infrastructure-puppet for the puppet server, and provisioned as follows:
ssh root@puppet-02.stage.ops.jquery.net
$ cd /etc/puppetlabs/environments
$ rm -rf production/
$ apt-get install git
$ git clone https://github.com/jquery/infrastructure-puppet.git production/
This does not require a deployment ssh key, it can be an unauthenticated clone over HTTPS since this is the public puppet repository.
@atdt I originally wanted to set it up such that the public repo reflects /etc/puppetlabs/code
rather than /etc/puppetlabs/code/environments/production/
, but I couldn't get use of the modules directory to work. I had the following on the puppet server at /etc/puppetlabs/puppet/puppet.conf
[main]
disable_per_environment_manifest = true
default_manifest = /etc/puppetlabs/code/manifests
basemodulepath = /etc/puppetlabs/code/modules
But alas, it wasn't applying anything. There was no error though, it ran cleanly on codeorigin-01, but just didn't apply any roles. So the modulepath may've worked but that it was site.pp
that was being ignored. Alas, in https://github.com/jquery/infrastructure-puppet/commit/ec2631b5b97a0554e35f8d1ad6e7c6b55bb700d6 I moved it all down a level and that's working now.
root@codeorigin-01:~# puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for codeorigin-01.stage.ops.jquery.net
Info: Applying configuration version '(491cd5b) Timo Tijhof - get_config: Follows-up ec2631b5b9'
Notice: /Stage[main]/Role::Codeorigin/Package[nginx]/ensure: created
Notice: Applied catalog in 9.08 seconds
Let me know if anything is off here, or could be better. Otherwise, next steps:
I think it'd be neat if the public repo is standalone and self-sufficient for staging and local use, e.g. not hard-require a third repo with pseudo secrets to be integrated to substitute for the real secret repo. Instead, we might be able to get away with only having private data come from Hieradata YAML files, which have a straight-forward inheritence chain that we can configure in production to include one extra layer from a checkout of the private repo.
In particular, the following non-trivial things seem best to provision via puppet instead of statically once:
infra-puppet-secret
repo. Also: this presents a bootstrapping problem, so we'll probably have to write a plain shell version as well that we document. infra-puppet
and infra-puppet-sectet
up-to-date.I think it'd be neat if the public repo is standalone and self-sufficient for staging and local use, e.g. not hard-require a third repo with pseudo secrets to be integrated to substitute for the real secret repo. Instead, we might be able to get away with only having private data come from Hieradata YAML files, which have a straight-forward inheritence chain that we can configure in production to include one extra layer from a checkout of the private repo.
What would happen if a secret were to get accidentally deleted from the private repo? IIUC, with the setup that you're suggesting, we wouldn't get an error; the secret will simply quietly get the dummy value in production. I don't think we want that; we'd want Puppet to fail loudly in that case, which we'd get if we had a real and fake Puppet private repos.
@atdt Thanks, I hadn't thought of that!
I'd still like to try once more if we can avoid a third repo for fake-secrets, however. Rather than place all the dummy values in a "common" file in the puppet repo as I described before, what if we instead placed (most) of that in a "dummy" file that is still in the same puppet repo but indeed only optionally included, similarly to how we'd optionally include the private files. I believe that would effectively achieve the same, but with the file present in the repo rather than being brought in or symlinked from a separate repo, right?
@Krinkle I think that works, yes. Let's give it a shot.
While running puppet agent -tv
works on clients and uses the correct server (puppet-02), the systemd service that was started originally kept failing as seen in syslog and via systemctl status puppet
. Running systemctl restart puppet
fixed that.
I'm gonna assume this is expected and simply because we ran puppet config set ...
during the provisioning without restarting after that. Noting this here to be documented later as part of the provisioning steps.
I've provisioned myself and Ori on the new system, and also resolved https://github.com/jquery/infrastructure/issues/560 at the same time (Automatically remove unpuppetized root keys).
Notes from meeting with @atdt and myself:
ensure => present
instead of ensure => latest
for packages, same as current infra. It seems worth the trade-off between risk of potential issues when we're away given how small we are vs benefit of keeping up with exact latest versions. The slight downside of this is that if we have to re-create a droplet from scratch, it might end up with a slightly newer version as part of that process (e.g. a minor update within the same Debian stable channel).Last remaining work:
plugins.jquery.com
).The first one is blocked on https://github.com/jquery/infrastructure-puppet/issues/29
I've deleted the tarsnap backups of wp-01 using the command at https://github.com/jquery/infrastructure-puppet/issues/19#issuecomment-1699644372, and turned off the droplet. I'll delete it next week if nothing comes up by then.
Just shy of its 10 year anniversary. Pretty good uptime!
Decommissioning jenkins can be tracked specifically here: https://github.com/jquery/infrastructure-puppet/issues/47
After that, we can decommission puppet.ops.jquery.net and close this issue.
Timmy deleted the jenkins-01 droplet. I've now also deleted its DNS definition, and done the same for puppet.ops.jquery.net
, which ticks all the boxes on this task!
List of hosts
https://github.com/jquery/infrastructure/blob/puppet-stage/manifests/site.pp
puppet.ops.jquery.net
wp-01
, jquery.comwp-02
, most other sites (incl *.jquery.org, jqueryui.com, etc)wp-03
, codeorigin.jquery.com, releases.jquery.com, and recipient of Git assetswp-01.stage
, WordPress doc sites, staging, all domains (stage.api.jquery.com, etc)builder-01
builder-03.stage
jq03.stage.jquery.com
(stage.demos.jquerymobile.com, stage.themeroller.jquerymobile.com)jenkins-01
~ decommissionedcla-01.ops.jquery.net
cla-01.stage.jquery.net
gruntjs.ops.jquery.net
gruntjs.stage.jquery.net
origin-01.ops.jquery.net
, contentorigin (content.jquery.com, static.jquery.com)swarm-01.ops.jquery.net
, TestSwarmview-01.ops.jquery.net
, View, git assetstrac.ops.jquery.net
, Trac, (bugs.jquery.com, bugs.jquerui.com)Dedicated tickets:
Overview
In order to get away from the very outdated Debian versions and such, we need to also get to a newer Puppet version.
We are currently using numerous Puppet 2 features that were deprecated in Puppet 3 and removed in Puppet 4. The main change that I think affects us is the change from "environment configs" to "environment directories".
Some relevant links:
Status quo: Puppet 3
The puppet server runs at puppet.ops.jquery.net (in legacy docs: puppet-master). The config for the server is at
/etc/puppet/puppet.conf
. There are two Git clones that we care about on this server:/etc/puppet
- This is a clone of jquery/infrastructure.git at branchpuppet-master
. This currently replaces the entire/etc/puppet
directory./etc/puppet-stage
– This is a directory we made up, containing another clone of jquery/infrastructure.git at branchpuppet-stage
.In
/etc/puppet/puppet.conf
(the only place the Puppet server actually looks at) we have the following stuff:By default, with one of our droplets that runs a puppet agent asks for provisioning, it gets provisoned by the main config which points simply at the subdirectories within
/etc/puppet
. On staging hosts, we have another/etc/puppet/puppet.conf
file that may containenvironment = stage
, which the agent passes on to the Puppet server, and so the Puppet server will consider that manifest and modulepath directory instead (in addition to compling it with$::environment = "stage"
).Beyond this, the only other thing worth knowing is that we use
jquery::postreceive
instances (similar to for the content sites) to automatically update these git checkouts after commits to them. The actual applying of changes however is passive, based on puppet agents checking in with the server every 30 minutes (default Puppet agent behaviour).Puppet 4
Under Puppet 4, things are a little bit different. There is no longer support for the
templatedir
,manifest
, andmodulepath
parameters, and there is no longer support for per-environment configuration section overrides.Instead, modules are read from a directory like
/etc/puppet/code/environments/:environment/modules
and manifests are read from a directory like/etc/puppet/code/environments/:environment/manifests
. For example:/etc/puppet/code/environments/production/modules
.I think global templates are no longer supported, or at least not varying by environment. But that's okay, we only have one file in
/templates
and that'll either just not support staging or maybe we can even get rid of it (do we still use Zabbix?).The new directory layout seems feasible, we just create two more clones and keep both for a little while.
Transition
I noticed just now that, apart from a few minor tweaks being needed for deprecated features, more generally it is not supported to connect Puppet 4 clients to a Puppet 3 server. However, the other way around is supported. So, the puppet master will have to go first, and that means a master switch, and setting up a new one of those first as well.
The good news is, a Puppet server is relatively easy to configure and gradually switch to...