Open sxa opened 10 months ago
The Joyent/equinix_mnx machines are in a separate account (Nodecore) -- I think MNX are paying for those so would hopefully be unaffected (cc @bahamat).
FWIW the Jenkins workspace machines are c3.small.x86 which are: 1x Intel Xeon E-2278G 8-Core Processor @ 3.40GHz 32GB RAM 2x 480GB SSD 2x 10Gbps 1x Intel HD Graphics P630
I think we're only using one of the two disks.
By contrast the third non-Equinix hosted jenkins-workspace machine hosted on IBM Cloud is: 2 vCPU | 4 GB 25 GB SAN boot disk 1 TB SAN disk 100 Mbps
So I think the takeaway here is disk space. Also jenkins-workspace-7
is where our temp binary git repository (used in the arm and Windows fanned jobs) currently resides.
@richardlau the Nodecore systems referenced are also on an account that's currently 100% subsidized, and that subsidy is ending.
I'm currently investigating what I can do about pricing discounts, but I know that "free" is not continuing for these.
We'd like to offer hosting those instances on mnx.io.
This would be like when they were hosted at Joyent. We'd set up a dedicated Triton account with individual instances (rather than two dedicated physical servers). The account billing will be covered by us (MNX). I will assist in getting everything set up and provide credentials to anyone that needs it.
We're also adding another datacenter which will be publicly available in the coming months for the offsite backup instance.
Thank you @bahamat - that's great to hear! Let us know when that's in place.
@bahamat That's sounds great. For clarity, would that include the two machines in the Node.js account or just the ones in the Nodecore one?
@richardlau The NodeCore account is the only one I have access to, so that’s the one I meant.
For the others, I’d need to know what the requirements are, then I need to check if we have available capacity for it.
If you have VMs there, I need the cpu/ram/storage for them, then I can see how much more we can provide.
@bahamat Details are in https://github.com/nodejs/build/issues/3597#issuecomment-1863250788. There are two machines with that configuration (c3.small.x86 in Equinix). I think we might not need as much CPU/RAM, but they are consuming disk space, i.e.
jenkins-workspace-7:
root@test-equinix-ubuntu2204-x64-1:/home/iojs# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 439G 354G 63G 86% /
root@test-equinix-ubuntu2204-x64-1:/home/iojs# du -hs /home/iojs/build/workspace/
240G /home/iojs/build/workspace/
root@test-equinix-ubuntu2204-x64-1:/home/iojs# du -hs /home/iojs/build/binary_tmp.git
56G /home/iojs/build/binary_tmp.git
root@test-equinix-ubuntu2204-x64-1:/home/iojs#
jenkins-workspace-8:
root@test-equinix-ubuntu2204-x64-2:/home/iojs# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 438G 107G 309G 26% /
root@test-equinix-ubuntu2204-x64-2:/home/iojs# du -hs /home/iojs/build/workspace/
101G /home/iojs/build/workspace/
root@test-equinix-ubuntu2204-x64-2:/home/iojs#
@bahamat Have you been able to review https://github.com/nodejs/build/issues/3597#issuecomment-1864805179 and https://github.com/nodejs/build/issues/3597#issuecomment-1863250788 as to whether the two Jenkins workspace machines could be included in the mnx offer, or if we'll need to source replacements elsewhere?
What is the timeline for this issue? Is there a deadline from Equinix?
What is the timeline for this issue? Is there a deadline from Equinix?
I was checking the notes and seems like start of April is the current deadline.
Do we feel confortable with the given deadline?
Reading through the meeting minutes now, thanks. @UlisesGascon .
Regarding deadlines, I want to give the message of a sense that we want the project to continue to succeed and we don't want to disrupt operations, and also it's important now to have a plan in place for transition on a timeline. We'll support you through that timeline. (and if you need more time, let me know, but don't delay unnecessarily).
This is a list of the affected machines:
In the Nodecore organization (these VMs are spread over two instances in Equinix Metal):
In the Node.js organization (these are separate instances in Equinix Metal, current specs https://github.com/nodejs/build/issues/3597#issuecomment-1863250788):
Oh, I've just noticed we already listed these in the issue description up top, except the release machines are missing from that list -- I'll update the list in the description as well🙂.
As mentioned in the call today, I'd like to add the operations@openjsf.org account to the equinix accounts in question so we can get a handle on all the details.
@nodejs/build can you chime in with your ok, concern/objections with adding the linuxIT operations account to the equinix accounts?
+1 from me.
On Fri, 8 Mar 2024 at 17:30, Michael Dawson @.***> wrote:
@nodejs/build https://github.com/orgs/nodejs/teams/build can you chime in with your ok, concern/objections with adding the linuxIT operations account to the equinix accounts?
+1
regards --
Richard Lau
Software Engineer, Runtimes
Red Hat https://www.redhat.com/
@.*** https://www.redhat.com/
@richardlau @mhdawson I've been able to confirm that MNX is happy to host the additional machines from the Node.js org from Equinix Metal, as well as the NodeCore instances.
Thank you @bahamat glad to see this.
@ryanaslett we talked about this one in the build WG meeting today. Is it possible to get an update on the planned migration, migration times.
The mnx.io account is established, and instances can be provisioned there.
I still need access to the equinix accounts, so I havent been able to use information from that to help inform a plan. Not sure what the next steps are after the +1’s above.
I also do not have admin on either the test or release jenkins, so once machines are provisioned Im not able to add the machines as workers or set their jenkins secret, and will need to coordinate with somebody to either get jenkins admin access or help add these to jenkins.
Need to confirm the actual necessary size for these machines, as well as work with @bahamat to define a package(cpu/ram/disk) that properly mirrors the actual workload of these machines. Once that is set up I can provision these so they can be added to the list of potential jenkins-worker machines. Once we determine that they are successfully handling jobs, we would be clear to turn off the other machines.
I don't have access to the infra secrets, so I'm unable to thoroughly investigate anything regarding the 3 infra machines.
This appears to be down. Unsure of its status. I was hoping to peruse some stats about the worker machines to get proper sizing using the grafana data.
No access to this, but my assumption is that there is major amounts of data that could take a long time to transfer.
Not sure what this machine is used for, or its status.
These machines are running very old versions of smartOS.
SmartOS 18 and 20 are both no longer supported. The smartos18 machines only seem to be occasionally testing libuv, and no supported version of node works with smartos18 (https://github.com/nodejs/build/blob/main/jenkins/scripts/VersionSelectorScript.groovy#L66)
Im not sure what node’s testing policy is, but it seems as though it may have been lowest supported/highest supported, which seems like we should target:
SmartOS 21.4.1 (Lowest) SmartOS 23.4.0 (Highest)
And continue the pattern of 1 release machine/2 testing machines for each version.
Two containers running on this machine:
Without access to the release jenkins, I'm not really sure whether or not these could exist somewhere else or need to be on their own singular host. It seems as though we should be able to find a docker host somewhere else to run these.
Appears to be a secondary testing node for ubuntu 18. This can be replicated on mnx fairly easily, if its still deemed necessary. I'm not sure what the policy is for EOL distro testing (and for the release container above).
As far as a timeline goes, some of these are low hanging fruit, while others may be major undertakings. I was originally under the impression that the two jenkins worker nodes were the only two instances that were of concern, but I now understand that everything in the nodecore account needs to find a new home.
Ideally, If I can get access to
That would expedite getting things migrated.
Jenkins Workspace Machines
root@test-equinix-ubuntu2204-x64-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.2G 1.5M 3.2G 1% /run
/dev/sda3 439G 359G 57G 87% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.2G 4.0K 3.2G 1% /run/user/1001
tmpfs 3.2G 4.0K 3.2G 1% /run/user/0
@ryanaslett I've invited you to the "Node.js" Equinix Metal organization as an owner (this is the only one of the accounts owned by the Build WG) which contains the two Jenkins workspace machines. The other machines to be migrated are in the "Nodecore" Equinix Metal organization, which I think @bahamat would need to invite you to.
We should open a separate issue solely around additional access (i.e. admin on both Jenkins).
I can't ssh into grafana.
backup, as you surmise, does contain a lot of data.
unencrypted is a mirror of www (hosted on Digital Ocean) and is configured to be the failover server in Cloudflare should www be unavailable.
For smartos versions, talk to @bahamat and the folks at mnx -- they're likely to be in a much better position to advise on which smartos versions we should be testing on.
I can't ssh into grafana.
@richardlau I looked at the grafana instance today. Looks like it crashed with a full disk and consequently didn't boot properly. I cleared the boot prompt so that it would come up and ssh is available now, but I didn't do anything to address the disk issue so I don't think Grafana is healthy yet. I figured addressing the disk space was better left to your team.
I discussed this with @ryanaslett earlier today, so this may be old news to some folks already.
@ryanaslett While the security release is still being tested (we're waiting for the security release to be done before changing anything), in preparation would it be possible to PR the new machines into the Ansible inventory? I think they've had secrets added, but no corresponding entries with IP addresses (and/or account).
@ryanaslett and I have migrated the two Equinix-hosted jenkins-workspace
machines.
These are now offline in Jenkins
replaced by
The temp binary git repository has been cloned across to test-mnx-ubuntu2204-x64-1 and the two Jenkins variables (TEMP_REPO
and TEMP_REPO_SERVER
) updated to the new machine's IP address. We've also added entries to known_hosts
for the VMs that need to push/pull to the temp binary git repository.
Let's run like this for a few days and if no new issues arise we can turn off the Equinix jenkins-workspace
machines.
Thanks for all your work getting this moved over - appreciate it.
Status Update: The two jenkins-workspace
machines have been successfully replaced, and I have removed them from the nodejs.org organization at equinix metal (ef5bd919-c911-4c87-a101-dff7872396a4).
The backup server has also been replaced by its new counterpart at mnx.io, and I have removed the backup server from the nodecore organization (a988c5d8-0f10-4d90-a6b4-f348757355d7).
The final server has 3 more services to transition:
unencrypted
host which is a current failover host for nodewww releasesI will focus on the docker host and the release standby server next. The Grafana host was decided that it can be decomissioned as is and we'll either stand up another later, but it shouldnt be a blocker to the existing decomissioning. The smartos testing nodes will require a meeting and coordination with the smartos community.
There is one additional host, ubuntu1804-x64-1: {ip: 147.28.162.99, user: ubuntu}
It's tied to the ubuntu1804-64 label, which it shares with another digitial ocean server
https://ci.nodejs.org/computer/test%2Dequinix%5Fmnx%2Dubuntu1804%2Dx64%2D1/
Are jobs still running on ubuntu1804 ? I cant seem to find any evidence of jobs that have run recently on those hosts?
I looked at the Jenkins config backups and there is no major job that depends on the ubuntu1804-64
label. Both hosts can probably be deleted.
Great. I will just "not migrate" that host then.
Equinix have been sponsoring our infrastructure by providing a generous amount of capacity for the Node.js infrastructure. This is now coming to and end and we need to make a plan for migrating our systems away from Equinix (Note: This does not affect the aarch64 Altras which are supplied as part of the Works On Arm project, but are hosted by Equinix)
The joyent and equinix_mnx ones are in the
nodecore
project in the portal, the two test ones are inNode.js