nodejs / build

Better build and test infra for Node.
505 stars 165 forks source link

Action required by 4 March 2025 ci-release #3852

Open richardlau opened 2 months ago

richardlau commented 2 months ago

https://cloud.ibm.com/classic/support/event/details/162467655

Event Description IMS 2024 Announcement Closures: DAL09 - POD3 and POD4 ======================================================================

Subject: Time-sensitive action required: Datacenter modernization announcement

Thank you for your business and trust in IBM as your valued business partner and cloud provider. We’re committed to your success and prioritizing your experience within our datacenter infrastructure.

We have made significant investments in our new IBM Cloud datacenters and Multizone Regions (MZRs) designed to deliver a more resilient architecture with higher levels of network throughput and redundancy with our latest generation cloud technologies.

As part of this modernization strategy, we have made the decision to consolidate select data centers and help our customers shift operations to our newer and higher-capacity facilities, including the decision to close the following datacenter on March 04, 2025. This will not impact any other PODs within DAL09.

• DAL09 - POD3 and POD4

This means you will need to migrate the workloads running in these locations to one of our newer IBM Cloud datacenters before this date – but don’t worry we can help ensure you pay the same or a lower price that you pay today for the same or a better configuration!

For our valued IBM Cloud® platform customers:

We recognize transferring datacenters can be complex and costly, so we are offering: • Two months free on replacement servers or services in our new datacenters (Promo Code: DCMIGRATE2024) or Four months free when you optimize your IT infrastructure by migrating from Bare Metal to VSI or VPC (Promo Code: UPGRADE2VSI2024). • The same or a lower price with a same or better configuration. • Migration assistance. This includes a free architectural consultation with guidance on recommended configurations to help you transition and maximize solution performance. We have the support of a third-party partner available to help with your data migration at no charge. Additionally, we may be able to help with other migration requirements, depending on your needs

Your action needed: • Identify impacted servers/services. Contact us through your IBM Cloud Portal (https://cloud.ibm.com/login) or reach out to the Customer Success team via live chat (https://www.ibm.com/cloud/data-centers?focusArea=WCP%20- %20Cloud%20services%20-%20all%20other&contactmodule) or by phone: (US) 866- 597-9687; (EMEA) +31 20 308 0540; (APAC) +65 6622 2231. • Migrate your workloads currently running in the impacted datacenters/PODs. • Free migration assistance is available through our partner Wanclouds (https://www.wanclouds.net/ibm-request). • Cancel your servers after migration. After you complete the migration to your new servers, make sure to cancel your existing servers / services. Existing services will continue to be invoiced until cancelled.

Key Milestones: Between now and the final reclaim date, there are several key milestones to be aware of: • August 06, 2024: General Announcement Date • August 06, 2024: No New Account provisioning in impacted data centers. • October 14, 2024: No provisioning on existing accounts in impacted data centers. • February 05, 2025: network maintenance: Remaining services in DAL09 PODs 3 and 4 will experience network disruption during the network maintenance. Customers will need to contact IBM Cloud to restore service. • February 10, 2025: Final date to submit migration assistance request • March 04, 2025: DATACENTER CONSOLIDATION DATE: final day to migrate data in DAL09 PODs 3 and 4

Where can I get more information?

To identify your impacted servers, take advantage of our special offers, or learn about recommended configurations or datacenters, contact our IBM Customer Success team via: • Live chat (https://www.ibm.com/cloud/data-centers/?focusArea=WCP%20- %20Pooled%20CSM&contactmodule) • Phone: (US) 866-597-9687; (EMEA) +31 20 308 0540; (APAC) +65 6622 2231 • About datacenter closures on: https://cloud.ibm.com/docs/get-support?topic=get- support-dc-closure

Thank you for your continued partnership with IBM. If you have additional questions or would like help during this migration, please let us know.

IBM Cloud, Customer Success Team Devices Affected infra-ibm-ubuntu1804-x64-1.nodejs.private

The affected machine is ci-release (FWIW despite its infra-ibm-ubuntu1804-x64-1.nodejs.private name, it is running Ubuntu 20.04). We don't have to migrate it urgently, but we should plan to avoid September/October (Node.js 23) and March (Node.js 24).

targos commented 3 days ago

I'd like to try tackling it this weekend. Would it be doable to create a new machine with similar specs (on Ubuntu 24.04) and migrate the data and config to it?

richardlau commented 3 days ago

I'd like to try tackling it this weekend. Would it be doable to create a new machine with similar specs (on Ubuntu 24.04) and migrate the data and config to it?

Yes, I think so. We rebuilt ci-release back in 2021 so there's some history for reference: https://github.com/nodejs/build/issues/2626#issuecomment-822404824

targos commented 1 day ago
targos commented 1 day ago

I did everything I could. https://ci-release.nodejs.org now points to the new server. There are a few open questions/tasks:

I'm leaving for holiday tomorrow. Anyone else feel free to finish the migration while I'm away.

Test build: https://ci-release.nodejs.org/job/iojs+release/10554/

richardlau commented 1 day ago

💚 Thanks for doing this.

I did everything I could. https://ci-release.nodejs.org now points to the new server. There are a few open questions/tasks:

* Do we need to do something on the release nodes? https://ci-release.nodejs.org/computer/ seems to suggest they connected to the new machine without issues.

I don't believe we need to do anything on the release nodes so long as ci-release.nodejs.org points to the correct server.

* I don't know what needs to be done for backups

Again I think this should just work so long as ci-release.nodejs.org was updated. I'll look at the whats on the backup machine tomorrow.

* What about the SSL certificate?

ci-release.nodejs.org seems to be behind the expected certificate -- I can't remember if this is being server from nginx or Cloudflare for the Jenkins servers.

targos commented 1 day ago
  • What about the SSL certificate?

ci-release.nodejs.org seems to be behind the expected certificate -- I can't remember if this is being server from nginx or Cloudflare for the Jenkins servers.

It's being served from nginx. My interrogation is about renewal. Do we need to do something on the server so it is automatically renewed when necessary?

targos commented 1 day ago

Test build: ci-release.nodejs.org/job/iojs+release/10554

macOS jobs haven't started: CleanShot 2024-10-20 at 20 19 42

richardlau commented 1 day ago

Not seeing any osx13 machines in https://ci-release.nodejs.org/computer/ although I'm not sure if we expect to with the ephermeral VM set up. @UlisesGascon @ryanaslett

richardlau commented 4 hours ago

My interrogation is about renewal. Do we need to do something on the server so it is automatically renewed when necessary?

No, at least at the moment the certificates have been manually updated yearly.

richardlau commented 4 hours ago
* I don't know what needs to be done for backups

Again I think this should just work so long as ci-release.nodejs.org was updated. I'll look at the whats on the backup machine tomorrow.

Well I was wrong -- it doesn't look like the backups worked (last update in /data/backup/periodic/daily.0/ci-release.nodejs.org/jobs/iojs+release/builds/ on the backup server is from 19 Oct). This is because the server needs to have the public key for backup (found in the infra section of the secrets repo) added to authorized_keys on the Jenkins servers so that the backup machine can ssh into them. Will fix.

richardlau commented 4 hours ago

I checked that I could successfully ssh into ci-release from the backup machine (after removing the known host as the server has changed). I also tried to run remove_old.sh ci-release.nodejs.org but this failed at the end trying to trigger a reload -- possibly due to the credential being used? (@ryanaslett it looks like backup is using your credential for Jenkins -- I have a vague recollection you may have asked/mentioned this before when setting up the backup server but I've forgotten the context (possibly it was using a former Build WG member's credential who was removed from one the Node.js org teams?).)

# /root/backup_scripts/remove_old.sh ci-release.nodejs.org
<html>
<head><title>400 Bad Request</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx/1.24.0 (Ubuntu)</center>
</body>
</html>
ryanaslett commented 1 hour ago

Yes, backup was using a different contributor's credentials, and we didn't have a good mechanism for a service account. I'll open a separate issue to investigate how to address that.

ryanaslett commented 1 hour ago

Unrelated, but I cant update my ssh config using ansible, so I can get onto the new server to investigate the vpn connectivity.

The secrets/build/test/inventory.yml file wasnt encrypted with my key for some reason.

ryanaslett commented 1 hour ago

For new jenkins hosts we'll need to add https://github.com/nodejs/build/blob/main/doc/orka-vpn.md as another set of steps (until its automated).

The vpn is now connected again, and jobs are running.