Open ryanaslett opened 1 month ago
Running a test build (re-run of today's V8 canary): https://ci-release.nodejs.org/job/iojs+release/10355/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/
It appears as though there are two containers, but only one is being used right now (the iojs+release job shows the
cross-compiler-ubuntu1804-armv7-gcc-[6,8]
jobs as disabled
The ubuntu1804 container is redundant now as we're building Node.js 18, 20, 22 and later with the rhel8 container.
Running a test build (re-run of today's V8 canary): https://ci-release.nodejs.org/job/iojs+release/10355/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/
This ran out of space. Jenkins has automatically taken the agent offline with
[!CAUTION] Disk space is below threshold of 1.00 GiB. Only 361.32 MiB out of 8.73 GiB left on /home/iojs/build.
Attempted to re-run a daily job and it failed because the /home dir on the new mnx machines didnt have any space (all the disk was mounted on /data). Thats been moved to /home.
Next issue is that the docker data-root on the old machines was manually moved to /home/docker-lib
added an /etc/docker/daemon.json
file with:
{
"data-root": "/home/docker-lib"
}
https://ci-release.nodejs.org/job/iojs+release/10358/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/console is still running, which I think is due to not having anything in the ccache on the first run, but it has space now.
I really hope this is just a ccache issue but, its up to 4.5 hours and counting. Both CPU's are pegged at 100%. Im wondering if we need a bigger box.
It took over 8 hours but it did complete. I've started a rebuild which should utilize ccache so we can compare build times. https://ci-release.nodejs.org/job/iojs+release/10359/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/
It took over 8 hours but it did complete. I've started a rebuild which should utilize ccache so we can compare build times. https://ci-release.nodejs.org/job/iojs+release/10359/nodes=cross-compiler-rhel8-armv7-gcc-10-glibc-2.28/
This built in less than ten minutes (🎉) but failed to upload to node-www because we'll need to add its IP address to the ufw2 firewall there.
Ive added it's ip to the firewall there, so we should be good to test again.
I'll be on PTO next week, so up to you whether to wait till I get back or if you want to switch over to using this before then.
@ryanaslett Would it be possible to open PRs for the new machine (inventory in this repo and secrets) before you go?
@ryanaslett Would it be possible to open PRs for the new machine (inventory in this repo and secrets) before you go?
I realize I mistakenly pushed a few commits directly that were meant to be a PR. https://github.com/nodejs/build/commit/ed9abafa4824e9654bcb3d8e8d3ea5776548e47a and https://github.com/nodejs/build/commit/f563e77e9040604ab7da3c0af620ba5a7bf81f7d
So that ip address and the supporting changes to the ansible are in the repo.
I created a PR for the docker host secrets: https://github.com/nodejs-private/secrets/pull/339
No worries. I added a branch ruleset to avoid future direct pushes to main.
Ive added it's ip to the firewall there, so we should be good to test again.
I've taken the joyent container offline in ci-release and put the mnx one back online and will see how tomorrow's nightly/v8-canary build(s) go.
FWIW builds have been successful on the new container since the switch. Build times without a ccache (or where much of V8 needs to be recompiled (e.g. v8 canary)) are now ~9-10 hours (!). With a populated ccache, we're at a reasonable ~9mins.
Sub issue of #3597
I've rebuilt the ubuntu1804_docker-x64-1 host with ubuntu2404, and its docker containers are running and connected to ci-release.nodejs.org.
It appears as though there are two containers, but only one is being used right now (the iojs+release job shows the
cross-compiler-ubuntu1804-armv7-gcc-[6,8]
jobs as disabledI have marked both https://ci-release.nodejs.org/computer/release%2Dmnx%2Dubuntu1804%5Farm%5Fcross%5Fcontainer%2Dx64%2D2/ and https://ci-release.nodejs.org/computer/release%2Dmnx%2Dubuntu1804%5Farm%5Fcross%5Fcontainer%2Dx64%2D2/
As offline until we're ready to flip them on and test.
Im unsure the standard procedure here (enable the new ones, disable the old ones and wait for the daily build to run? or can we rebuild a previous build to test the new containers' validity?)