Open richardlau opened 2 months ago
I'd like to try tackling it this weekend. Would it be doable to create a new machine with similar specs (on Ubuntu 24.04) and migrate the data and config to it?
I'd like to try tackling it this weekend. Would it be doable to create a new machine with similar specs (on Ubuntu 24.04) and migrate the data and config to it?
Yes, I think so. We rebuilt ci-release back in 2021 so there's some history for reference: https://github.com/nodejs/build/issues/2626#issuecomment-822404824
apt install nginx
systemctl enable nginx
systemctl start nginx
apt install openjdk-21-jre-headless
systemctl enable jenkins
systemctl start jenkins
cd /etc/nginx
conf.d/jenkins-static.conf
apt install libnginx-mod-http-image-filter libnginx-mod-http-xslt-filter libnginx-mod-mail libnginx-mod-stream
sites-available/jenkins-iojs
ln -s ../sites-available/jenkins-iojs sites-enabled/jenkins-iojs
unlink sites-enabled/default
ssl
dirsystemctl restart nginx
apt install iptables-persistent
iptables-save > /etc/iptables/rules.v4
on old server/etc/iptables/rules.v4
systemctl restart netfilter-persistent
/dev/xvdc
)
fdisk /dev/xvdc
n
, p
, 1
, default, default, p
, w
mkfs.xfs /dev/xvdc1
systemctl stop jenkins
cd /var/lib
mv jenkins jenkins-old
mkdir jenkins
chown jenkins:jenkins jenkins
more /etc/mtab
and copy line to /etc/fstab
mv jenkins-old/* jenkins/
mv jenkins-old/.* jenkins/
rmdir jenkins-old
I did everything I could. https://ci-release.nodejs.org now points to the new server. There are a few open questions/tasks:
I'm leaving for holiday tomorrow. Anyone else feel free to finish the migration while I'm away.
Test build: https://ci-release.nodejs.org/job/iojs+release/10554/
💚 Thanks for doing this.
I did everything I could. https://ci-release.nodejs.org now points to the new server. There are a few open questions/tasks:
* Do we need to do something on the release nodes? https://ci-release.nodejs.org/computer/ seems to suggest they connected to the new machine without issues.
I don't believe we need to do anything on the release nodes so long as ci-release.nodejs.org points to the correct server.
* I don't know what needs to be done for backups
Again I think this should just work so long as ci-release.nodejs.org was updated. I'll look at the whats on the backup machine tomorrow.
* What about the SSL certificate?
ci-release.nodejs.org seems to be behind the expected certificate -- I can't remember if this is being server from nginx or Cloudflare for the Jenkins servers.
- What about the SSL certificate?
ci-release.nodejs.org seems to be behind the expected certificate -- I can't remember if this is being server from nginx or Cloudflare for the Jenkins servers.
It's being served from nginx. My interrogation is about renewal. Do we need to do something on the server so it is automatically renewed when necessary?
Test build: ci-release.nodejs.org/job/iojs+release/10554
macOS jobs haven't started:
Not seeing any osx13 machines in https://ci-release.nodejs.org/computer/ although I'm not sure if we expect to with the ephermeral VM set up. @UlisesGascon @ryanaslett
My interrogation is about renewal. Do we need to do something on the server so it is automatically renewed when necessary?
No, at least at the moment the certificates have been manually updated yearly.
* I don't know what needs to be done for backups
Again I think this should just work so long as ci-release.nodejs.org was updated. I'll look at the whats on the backup machine tomorrow.
Well I was wrong -- it doesn't look like the backups worked (last update in /data/backup/periodic/daily.0/ci-release.nodejs.org/jobs/iojs+release/builds/
on the backup server is from 19 Oct). This is because the server needs to have the public key for backup (found in the infra section of the secrets repo) added to authorized_keys
on the Jenkins servers so that the backup machine can ssh into them. Will fix.
I checked that I could successfully ssh into ci-release from the backup machine (after removing the known host as the server has changed). I also tried to run remove_old.sh ci-release.nodejs.org
but this failed at the end trying to trigger a reload -- possibly due to the credential being used? (@ryanaslett it looks like backup is using your credential for Jenkins -- I have a vague recollection you may have asked/mentioned this before when setting up the backup server but I've forgotten the context (possibly it was using a former Build WG member's credential who was removed from one the Node.js org teams?).)
# /root/backup_scripts/remove_old.sh ci-release.nodejs.org
<html>
<head><title>400 Bad Request</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx/1.24.0 (Ubuntu)</center>
</body>
</html>
Yes, backup was using a different contributor's credentials, and we didn't have a good mechanism for a service account. I'll open a separate issue to investigate how to address that.
Unrelated, but I cant update my ssh config using ansible, so I can get onto the new server to investigate the vpn connectivity.
The secrets/build/test/inventory.yml file wasnt encrypted with my key for some reason.
For new jenkins hosts we'll need to add https://github.com/nodejs/build/blob/main/doc/orka-vpn.md as another set of steps (until its automated).
The vpn is now connected again, and jobs are running.
https://cloud.ibm.com/classic/support/event/details/162467655
The affected machine is
ci-release
(FWIW despite itsinfra-ibm-ubuntu1804-x64-1.nodejs.private
name, it is running Ubuntu 20.04). We don't have to migrate it urgently, but we should plan to avoid September/October (Node.js 23) and March (Node.js 24).