nodejs / build

Better build and test infra for Node.
502 stars 165 forks source link

Move release docker host #3839

Open ryanaslett opened 1 month ago

ryanaslett commented 1 month ago

Sub issue of #3597

I've rebuilt the ubuntu1804_docker-x64-1 host with ubuntu2404, and its docker containers are running and connected to ci-release.nodejs.org.

It appears as though there are two containers, but only one is being used right now (the iojs+release job shows the cross-compiler-ubuntu1804-armv7-gcc-[6,8] jobs as disabled

I have marked both https://ci-release.nodejs.org/computer/release%2Dmnx%2Dubuntu1804%5Farm%5Fcross%5Fcontainer%2Dx64%2D2/ and https://ci-release.nodejs.org/computer/release%2Dmnx%2Dubuntu1804%5Farm%5Fcross%5Fcontainer%2Dx64%2D2/

As offline until we're ready to flip them on and test.

Im unsure the standard procedure here (enable the new ones, disable the old ones and wait for the daily build to run? or can we rebuild a previous build to test the new containers' validity?)

richardlau commented 1 month ago

Running a test build (re-run of today's V8 canary): https://ci-release.nodejs.org/job/iojs+release/10355/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/

richardlau commented 1 month ago

It appears as though there are two containers, but only one is being used right now (the iojs+release job shows the cross-compiler-ubuntu1804-armv7-gcc-[6,8] jobs as disabled

The ubuntu1804 container is redundant now as we're building Node.js 18, 20, 22 and later with the rhel8 container.

richardlau commented 1 month ago

Running a test build (re-run of today's V8 canary): https://ci-release.nodejs.org/job/iojs+release/10355/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/

This ran out of space. Jenkins has automatically taken the agent offline with

[!CAUTION] Disk space is below threshold of 1.00 GiB. Only 361.32 MiB out of 8.73 GiB left on /home/iojs/build.

ryanaslett commented 1 month ago

Attempted to re-run a daily job and it failed because the /home dir on the new mnx machines didnt have any space (all the disk was mounted on /data). Thats been moved to /home.

Next issue is that the docker data-root on the old machines was manually moved to /home/docker-lib

added an /etc/docker/daemon.json file with:

{
  "data-root": "/home/docker-lib"
}

https://ci-release.nodejs.org/job/iojs+release/10358/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/console is still running, which I think is due to not having anything in the ccache on the first run, but it has space now.

ryanaslett commented 1 month ago

I really hope this is just a ccache issue but, its up to 4.5 hours and counting. Both CPU's are pegged at 100%. Im wondering if we need a bigger box.

richardlau commented 1 month ago

It took over 8 hours but it did complete. I've started a rebuild which should utilize ccache so we can compare build times. https://ci-release.nodejs.org/job/iojs+release/10359/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/

richardlau commented 1 month ago

It took over 8 hours but it did complete. I've started a rebuild which should utilize ccache so we can compare build times. https://ci-release.nodejs.org/job/iojs+release/10359/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/

This built in less than ten minutes (🎉) but failed to upload to node-www because we'll need to add its IP address to the ufw2 firewall there.

ryanaslett commented 1 month ago

Ive added it's ip to the firewall there, so we should be good to test again.

I'll be on PTO next week, so up to you whether to wait till I get back or if you want to switch over to using this before then.

richardlau commented 1 month ago

@ryanaslett Would it be possible to open PRs for the new machine (inventory in this repo and secrets) before you go?

ryanaslett commented 1 month ago

@ryanaslett Would it be possible to open PRs for the new machine (inventory in this repo and secrets) before you go?

I realize I mistakenly pushed a few commits directly that were meant to be a PR. https://github.com/nodejs/build/commit/ed9abafa4824e9654bcb3d8e8d3ea5776548e47a and https://github.com/nodejs/build/commit/f563e77e9040604ab7da3c0af620ba5a7bf81f7d

So that ip address and the supporting changes to the ansible are in the repo.

I created a PR for the docker host secrets: https://github.com/nodejs-private/secrets/pull/339

targos commented 1 month ago

No worries. I added a branch ruleset to avoid future direct pushes to main.

richardlau commented 1 month ago

Ive added it's ip to the firewall there, so we should be good to test again.

I've taken the joyent container offline in ci-release and put the mnx one back online and will see how tomorrow's nightly/v8-canary build(s) go.

richardlau commented 1 month ago

FWIW builds have been successful on the new container since the switch. Build times without a ccache (or where much of V8 needs to be recompiled (e.g. v8 canary)) are now ~9-10 hours (!). With a populated ccache, we're at a reasonable ~9mins.

image