nodejs / build

Better build and test infra for Node.
507 stars 166 forks source link

Optimize disk usage by colocating Jenkins workspaces #3897

Open targos opened 1 month ago

targos commented 1 month ago

At the moment, each Jenkins job runs in its own workspace. Since most of our machines are able to run only one job at a time, I suggest that we configure some of the jobs that require to build the node binary, to all run on a specific hardcoded workspace. That way, we wouldn't have multiple copies of the build artifacts.

targos commented 1 month ago

Example of a host that would benefit from it: https://ci.nodejs.org/computer/test%2Dibm%2Dubuntu2204%2Dx64%2D2/

root@test-ibm-ubuntu2204-x64-2:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           390M  1.1M  388M   1% /run
/dev/xvda2       24G   22G  829M  97% /
root@test-ibm-ubuntu2204-x64-2:~# du -sh /home/iojs/build/workspace/*
615M    /home/iojs/build/workspace/citgm-smoker
2.9G    /home/iojs/build/workspace/node-stress-single-test
2.0G    /home/iojs/build/workspace/node-test-commit-custom-suites-freestyle
20K     /home/iojs/build/workspace/node-test-commit-custom-suites-freestyle@tmp
3.4G    /home/iojs/build/workspace/node-test-commit-linux
4.0K    /home/iojs/build/workspace/node-test-commit-linux@tmp
365M    /home/iojs/build/workspace/node-test-node-addon-api-new

We could save about 5GB (and possibly some compilation time) if node-stress-single-test, node-test-commit-custom-suites-freestyle, and node-test-commit-linux used the same workspace.

richardlau commented 1 month ago

SGTM

RedYetiDev commented 1 month ago

Theoretically, we could also build the node binary in GitHub actions, which are running these jobs anyway, and download it as an artifact to save time?

richardlau commented 1 month ago

Theoretically, we could also build the node binary in GitHub actions, which are running these jobs anyway, and download it as an artifact to save time?

No, because GH actions does not support all of the platforms we build for.

targos commented 1 month ago

I started by updating https://ci.nodejs.org/view/All/job/node-test-commit-custom-suites-freestyle/ and https://ci.nodejs.org/view/All/job/node-test-commit-linux/ Their custom workspace is now /home/iojs/build/workspace/node

richardlau commented 1 month ago

I started by updating https://ci.nodejs.org/view/All/job/node-test-commit-custom-suites-freestyle/ and https://ci.nodejs.org/view/All/job/node-test-commit-linux/ Their custom workspace is now /home/iojs/build/workspace/node

FYI https://github.com/nodejs/jenkins-alerts/issues/2871

[root@test-ibm-rhel9-x64-1 ~]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       24G   21G  1.5G  94% /
[root@test-ibm-rhel9-x64-1 ~]# du -hs /home/iojs/build/workspace/*
3.1G    /home/iojs/build/workspace/node
2.9G    /home/iojs/build/workspace/node-stress-single-test
3.2G    /home/iojs/build/workspace/node-test-commit-linux
4.0K    /home/iojs/build/workspace/node-test-commit-linux@tmp
4.0K    /home/iojs/build/workspace/node@tmp
[root@test-ibm-rhel9-x64-1 ~]#

I'll delete the node-stress-single-test and node-test-commit-linux workspaces.

targos commented 1 month ago

The changes seem to work, but now I'm not sure how to handle node-stress-single-test. The job runs on machines that don't have /home/iojs/build/workspace (macOS, Windows). I would like to use a relative path instead, but I'm not sure what "remote FS root" means in the inline documentation:

CleanShot 2024-09-17 at 18 13 43@2x

richardlau commented 1 month ago

AFAIK "remote FS root" is a combination of:

which is how we end up with:

So I think we can set the workspace in the job to a relative "node".

targos commented 1 month ago

Thanks, done. Test run: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/536/

targos commented 1 month ago

I cancelled it and reverted the config change. It ran in /home/iojs/build/node on the workspace machine and C:\node on Windows.

targos commented 1 month ago

Trying again with workspace/node: https://ci.nodejs.org/job/node-stress-single-test/537/

richardlau commented 1 week ago

I'm wondering if we're getting collisions now on the workspace machines: e.g. https://ci.nodejs.org/job/node-test-commit-linux/61452/console

07:00:13  > git reset --hard # timeout=10
07:00:13 ERROR: Error fetching remote repo 'origin'
07:00:13 hudson.plugins.git.GitException: Failed to fetch from git@github.com:nodejs/node.git
07:00:13    at PluginClassLoader for git//hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:997)
07:00:13    at PluginClassLoader for git//hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1239)
07:00:13    at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1310)
07:00:13    at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
07:00:13    at hudson.scm.SCM.checkout(SCM.java:540)
07:00:13    at hudson.model.AbstractProject.checkout(AbstractProject.java:1247)
07:00:13    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
07:00:13    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
07:00:13    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
07:00:13    at hudson.model.Run.execute(Run.java:1894)
07:00:13    at PluginClassLoader for matrix-project//hudson.matrix.MatrixBuild.run(MatrixBuild.java:323)
07:00:13    at hudson.model.ResourceController.execute(ResourceController.java:101)
07:00:13    at hudson.model.Executor.run(Executor.java:446)
07:00:13 Caused by: hudson.plugins.git.GitException: Command "git reset --hard" returned status code 128:
07:00:13 stdout: 
07:00:13 stderr: fatal: Unable to create '/home/iojs/build/workspace/node/.git/index.lock': File exists.
07:00:13 
07:00:13 Another git process seems to be running in this repository, e.g.
07:00:13 an editor opened by 'git commit'. Please make sure all processes
07:00:13 are terminated then try again. If it still fails, a git process
07:00:13 may have crashed in this repository earlier:
07:00:13 remove the file manually to continue.

Looking at the build history on the workspace machine it looks like:

ran at the same time in the same workspace.

If we compare to other jobs that appear to have run at the same time on the same machine, e.g.

it looks like Jenkins appended a @2 to the workspace directory for the second build.

targos commented 1 week ago

mmmh that's annoying. Maybe there's a variable we can use in the config to also append the @2 ?

targos commented 1 week ago

I guess my question doesn't really make sense. It cannot know that a suffix is needed before evaluating this parameter.

richardlau commented 1 week ago

I was hoping for something like the executor number but these are flyweight jobs meaning that in these cases the EXECUTOR_NUMBER env var is -1.