Closed dduportal closed 1 year ago
Update:
Puppet Server installation:
20.12.27.65 puppet.jenkins.io
to /etc/hosts to avoid unwanted connection to the former VM puppet-*
commands available in PATHhostnamectl set-hostname puppet.jenkins.io && hostname -f # puppet.jenkins.io
/etc/puppetlabs/puppet/puppet.conf
for proper hostname/var/lib/puppet/keys
, pe-puppet user should be the owner, read only for user (chmod 0600):$ ls -l /var/lib/puppet/keys
total 8
-r-------- 1 pe-puppet root 1679 Jun 1 10:52 private_key.pkcs7.pem
-r-------- 1 pe-puppet root 1050 Jun 1 10:52 public_key.pkcs7.pem
puppet-backup restore /root/pe_backup-2023-05-31_14.55.46_UTC.tgz
(Ref. https://www.puppet.com/docs/pe/2019.8/backing_up_and_restoring_pe.html)# Check the Master hostname IS "puppet.jenkins.io"
###
Step 1 of 10: Stopping PE related services
# ...#
# Stuck at 10 of 10, because
# - Service pe-puppetdb stuck during its startup: https://tickets.puppetlabs.com/browse/PDB-4785
# - Logs in /var/log/puppetlabs/puppetdb/puppetdb.log shows postgres is started, but the connection puppetdb <-> postgres fails during TLS handshake (confirmed with tcpdump)
# - https://tickets.puppetlabs.com/browse/PDB-4625
Looking at https://www.puppet.com/docs/puppetdb/7/postgres_ssl.html#using-a-custom-java-keystore (yes, version 7 but the keystore is the same)
Trying disabling SSL for pupeptdb PgSQL conenction: still the "connection timeout" error. I was led in a bad path by the Puppet issues above.
Trying a curl -v puppet.jenkins.io:5432
helps to reproduce the "connection timeout" error: using the public IP forces TPC packets to exit the VM to the dmz
subnet where the security groups forbid inbound requests on the 5432 port => that is the real reason
Solution: Update /etc/hosts
with the private IP instead. Solved the problem!
A new cycle of unsintall and reinstall, following the same steps as the comment above
Restore went well:
Log messages will be saved to /var/log/puppetlabs/pe-backup-tools/pe_restore-2023-06-01_13.24.25_UTC.log
Step 1 of 10: Stopping PE related services
Step 2 of 10: Cleaning the agent certificates from previous PE install
Step 3 of 10: Restoring PE file system components
Step 4 of 10: Restoring the pe-orchestrator database
Step 5 of 10: Restoring the pe-rbac database
Step 6 of 10: Restoring the pe-classifier database
Step 7 of 10: Restoring the pe-activity database
Step 8 of 10: Restoring the pe-inventory database
Step 9 of 10: Restoring the pe-puppetdb database
Step 10 of 10: Configuring PE on newly restored master
Backup restored.
Time to restore: 4 min, 6 sec
Size: 2.26 GB, Scope: code, puppetdb, config, certs
To finish restoring your primary server from backup, run the following commands:
puppet agent --test
$ ls -l /root/.ssh/config /root/.ssh/deploy_key
-rw-r--r-- 1 root root 55 Jun 1 10:57 /root/.ssh/config
-r-------- 1 root root 1679 Jun 1 10:57 /root/.ssh/deploy_key
$ cat /root/.ssh/config
Host github.com
IdentityFile /root/.ssh/deploy_key
$ ssh -T git@github.com
Hi jenkins-infra/jenkins-keys! You've successfully authenticated, but GitHub does not provide shell access.
$ r10k deploy environment --color --verbose --puppetfile
# No errors, WARN accepted
$ puppet agent --test
# ...
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find class pe_console_prune for puppet.jenkins.io on node puppet.jenkins.io
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Error 500 on SERVER: Server Error: Could not autoload puppet/reports/datadog_reports: Datadog report config file /etc/datadog-agent/datadog-reports.yaml not readable
After a lot of dies and retries, solved it with a hack:
pe_console_prune
from the radish VM (/opt/puppetlabs/puppet/modules/pe_console_prune
) and copied it to the new machinepuppetserver gem cleanup
Initial agent run was successfull \o/
Notes:
pe_console_prune
requirement. Found an occurence in /opt/puppetlabs/server/data/puppetserver/yaml/node/puppet.jenkins.io.yaml
(restored from the backup) and removed it but this one might be cached somewhere and the agent run still showed the error.puppet module list
shows a LOT of incompatible dependencies, logged as WARN
message. Not blocking but it looks like a lot of the modules (example: datadog and apt) are not really updated between each othersUpdate:
/etc/hosts
which had unused entries including one for puppet.jenkins.io
)/etc/hosts
which had unused entries including one for puppet.jenkins.io
)puppet.jenkins.io
in the code (https://github.com/search?q=org%3Ajenkins-infra%20puppet.jenkins.io&type=code) led to:
140.211.9.94
) in the code (https://github.com/search?q=org%3Ajenkins-infra+140.211.9.94&type=code) does not show another occurenceClosing as it works as expected
Service(s)
Azure, Other
Summary
Upgrade of the
puppet.jenkins.io
to Ubuntu 22.04 broke the Puppet Enterprise server in https://github.com/jenkins-infra/helpdesk/issues/2982#issuecomment-1570715518 as it Jammy is not supported by PE 🤦This issue tracks the work to migrate the VM to an Azure Terraform-managed VM to restore the service (as we have backups taken before the Ubuntu migration).
Pros:
Cons:
Reproduction steps
No response