nodejs / build

Better build and test infra for Node.
507 stars 166 forks source link

osx11 release build fails quite often #3006

Closed targos closed 1 year ago

targos commented 2 years ago

Example: https://ci-release.nodejs.org/job/iojs+release/8645/nodes=osx11-release-tar/console

15:10:49 make[2]: /bin/sh: Bad address
15:10:52 make[2]: *** [/Users/iojs/build/ws/out/Release/obj.target/v8_base_without_compiler/deps/v8/src/heap/paged-spaces.o] Error 1
15:10:52 make[2]: *** Waiting for unfinished jobs....
targos commented 2 years ago

3 rebuilds were necessary to make it green:

image
targos commented 2 years ago

/cc @nodejs/build-infra

image

richardlau commented 2 years ago

I have no idea where to even begin with this.

targos commented 2 years ago

@AshCripps ?

mhdawson commented 2 years ago

I wonder if we are running out of memory. Does the failure occur more often on any particular machine/type of machine?

mhdawson commented 2 years ago

hmm, I guess it's just the one release machine.

mhdawson commented 2 years ago

Seems like we actually have two. From history it appears it fails on the macstadium machines and is fine on the nearform one.

I see from the inventory:https://github.com/nodejs/build/blob/main/ansible/inventory.yml

That we have server_jobs set to 6 on the macstadium machines. Maybe that is too many

   - macstadium:
        macos11.0-arm64-1:
            ansible_python_interpreter: /usr/bin/python3
            ip: 207.254.38.74
            user: administrator
            remote_env:
                PATH: /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin
            server_jobs: 6
sxa commented 2 years ago

We've got two things to try I guess:

richardlau commented 2 years ago

FWIW I've taken the macstadium macos11 release machine offline from the release ci for now to unblock the Node.js 18.8.0 release.

richardlau commented 2 years ago

Reducing the number of jobs seems like a reasonable thing to try -- will do that tomorrow along with all the other CI maintenance.

richardlau commented 2 years ago

I dropped JOBS down to 3 and the problem still occurs: https://ci-release.nodejs.org/job/iojs+release/8709/nodes=osx11-release-pkg/

richardlau commented 2 years ago

Dropped JOBS down to 1: https://ci-release.nodejs.org/job/iojs+release/8710/nodes=osx11-release-pkg/

richardlau commented 2 years ago

Dropped JOBS down to 1: https://ci-release.nodejs.org/job/iojs+release/8710/nodes=osx11-release-pkg/

Well that one errored in a different way,

12:24:13 make[1]: *** [node] Bus error: 10

which we've seen on the nearform macos 11 machine. So far we've only seen the Bad address error on the macstadium one.

Trying again: https://ci-release.nodejs.org/job/iojs+release/8711/nodes=osx11-release-pkg/console

richardlau commented 2 years ago

Four builds in a row failed with the bus error. Taken release-macstadium-macos11.0-arm64-1 offline again.

richardlau commented 2 years ago

FWIW AFAIK we are not seeing these problems on the test CI. The main differences on the release CI compared to the test CI are:

richardlau commented 2 years ago

We're still seeing the bus error quite often on the NearForm hosted machine. (The macstadium one remains offline because we couldn't get a clean build on it whereas the NearForm one sometimes succeeds.)

I don't really know much about macs (I don't use them). @AshCripps Any ideas?

AshCripps commented 2 years ago

The builds also have this in the logs

make: INTERNAL: Exiting with 1 jobserver tokens available; should be 3! which after googling points to a bug in make that shouldve been fixed - can we easily update make? is it as simple as running ansible? (its been a while). Actually saying that I believe make is provided by xcode so that might not be an easy fix.

EDIT: the bug in question is around parallelisation.

richardlau commented 2 years ago

I've rebooted test-orka-macos11-x64-2 and this has recovered a small amount of space.

Before

administrator@test-orka-macos11-x64-2 ~ % df -h
Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
/dev/disk2s5s1   90Gi   14Gi  224Mi    99%  553788 941116212    0%   /
devfs           189Ki  189Ki    0Bi   100%     653         0  100%   /dev
/dev/disk2s4     90Gi  1.0Gi  224Mi    83%       2 941669998    0%   /System/Volumes/VM
/dev/disk2s2     90Gi  305Mi  224Mi    58%    1038 941668962    0%   /System/Volumes/Preboot
/dev/disk2s6     90Gi  496Ki  224Mi     1%      18 941669982    0%   /System/Volumes/Update
/dev/disk2s1     90Gi   73Gi  224Mi   100%  427252 941242748    0%   /System/Volumes/Data
map auto_home     0Bi    0Bi    0Bi   100%       0         0  100%   /System/Volumes/Data/home
administrator@test-orka-macos11-x64-2 ~ %

After

administrator@test-orka-macos11-x64-2 ~ % df -h
Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
/dev/disk2s5s1   90Gi   14Gi  1.8Gi    89%  553788 941116212    0%   /
devfs           188Ki  188Ki    0Bi   100%     650         0  100%   /dev
/dev/disk2s4     90Gi  1.0Mi  1.8Gi     1%       1 941669999    0%   /System/Volumes/VM
/dev/disk2s2     90Gi  305Mi  1.8Gi    15%    1038 941668962    0%   /System/Volumes/Preboot
/dev/disk2s6     90Gi  504Ki  1.8Gi     1%      17 941669983    0%   /System/Volumes/Update
/dev/disk2s1     90Gi   73Gi  1.8Gi    98%  426524 941243476    0%   /System/Volumes/Data
map auto_home     0Bi    0Bi    0Bi   100%       0         0  100%   /System/Volumes/Data/home
administrator@test-orka-macos11-x64-2 ~ %

Prior to rebooting lsof showed some open files related to node even though no build was in progress:

administrator@test-orka-macos11-x64-2 ~ % sudo /usr/sbin/lsof | grep node
opendirec   110            root  txt       REG                1,7       32768              462548 /private/var/db/dslocal/nodes/Default/sqlindex-shm
opendirec   110            root    3u      REG                1,7        4096              462544 /private/var/db/dslocal/nodes/Default/sqlindex
opendirec   110            root    4u      REG                1,7     1145392              462547 /private/var/db/dslocal/nodes/Default/sqlindex-wal
opendirec   110            root    5u      REG                1,7       32768              462548 /private/var/db/dslocal/nodes/Default/sqlindex-shm
node      12667            iojs  cwd       DIR                1,7          64            60221100 /Users/iojs/node-tmp/.tmp.2647
node      12667            iojs  txt       REG                1,7    86002784            60207676 /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node
node      12667            iojs  txt       REG                1,7       36716              539642 /Library/Preferences/Logging/.plist-cache.pTbLKkdZ
node      12667            iojs  txt       REG                1,7     2547856 1152921500312767024 /usr/lib/dyld
node      12667            iojs  txt       REG                1,7    32214064 1152921500312778383 /usr/share/icu/icudt66l.dat
node      12667            iojs    0u     unix 0x2d58a1812c2645ad         0t0                     ->(none)
node      12667            iojs    1u     unix 0x2d58a1812c2648cd         0t0                     ->(none)
node      12667            iojs    2u     unix 0x2d58a1812c264a5d         0t0                     ->(none)
node      12667            iojs    3u   KQUEUE                                                    count=0, state=0xa
node      12667            iojs    4      PIPE 0xfdc1cb00a1bdb1e5       16384                     ->0xe408e968383444e0
node      12667            iojs    5      PIPE 0xe408e968383444e0       16384                     ->0xfdc1cb00a1bdb1e5
node      12667            iojs    6      PIPE 0x6951a4397517a8b0       16384                     ->0xe91cae28f1470497
node      12667            iojs    7      PIPE 0xe91cae28f1470497       16384                     ->0x6951a4397517a8b0
node      12667            iojs    8      PIPE 0x4ce1eb0c38a6898a       16384                     ->0x9cc0d6f06f59f5fc
node      12667            iojs    9      PIPE 0x9cc0d6f06f59f5fc       16384                     ->0x4ce1eb0c38a6898a
node      12667            iojs   10u   KQUEUE                                                    count=0, state=0xa
node      12667            iojs   11      PIPE 0x1533e3436f30338f       16384                     ->0x8c4000d5792a2ea0
node      12667            iojs   12      PIPE 0x8c4000d5792a2ea0       16384                     ->0x1533e3436f30338f
node      12667            iojs   13      PIPE 0x88bc6467008851b3       16384                     ->0xd788ade8faad15dd
node      12667            iojs   14      PIPE 0xd788ade8faad15dd       16384                     ->0x88bc6467008851b3
node      12667            iojs   15u   KQUEUE                                                    count=0, state=0xa
node      12667            iojs   16      PIPE 0xabe33fac0c7b60fa       16384                     ->0xff4a3780d944289e
node      12667            iojs   17      PIPE 0xff4a3780d944289e       16384                     ->0xabe33fac0c7b60fa
node      12667            iojs   18      PIPE 0xb7c54ce667e9fd36       16384                     ->0x4e38bd1005aadd8
node      12667            iojs   19      PIPE  0x4e38bd1005aadd8       16384                     ->0xb7c54ce667e9fd36
node      67604            iojs  cwd       DIR                1,7          64            60176964 /Users/iojs/node-tmp/.tmp.2647
node      67604            iojs  txt       REG                1,7    86002784            60163530 /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node
node      67604            iojs  txt       REG                1,7       36716              539642 /Library/Preferences/Logging/.plist-cache.pTbLKkdZ
node      67604            iojs  txt       REG                1,7    32214064 1152921500312778383 /usr/share/icu/icudt66l.dat
node      67604            iojs  txt       REG                1,7     2547856 1152921500312767024 /usr/lib/dyld
node      67604            iojs    0u     unix 0x2d58a1812c262415         0t0                     ->(none)
node      67604            iojs    1u     unix 0x2d58a1812c262735         0t0                     ->(none)
node      67604            iojs    2u     unix 0x2d58a1812c2628c5         0t0                     ->(none)
node      67604            iojs    3u   KQUEUE                                                    count=0, state=0xa
node      67604            iojs    4      PIPE 0xdb8970e1d0801fa1       16384                     ->0x3f56b88f93efed8b
node      67604            iojs    5      PIPE 0x3f56b88f93efed8b       16384                     ->0xdb8970e1d0801fa1
node      67604            iojs    6      PIPE 0x513f5aa52f4d2787       16384                     ->0xe8f3e090fd16d6
node      67604            iojs    7      PIPE   0xe8f3e090fd16d6       16384                     ->0x513f5aa52f4d2787
node      67604            iojs    8      PIPE 0xb21564c5346860aa       16384                     ->0x1e492b8bb882d476
node      67604            iojs    9      PIPE 0x1e492b8bb882d476       16384                     ->0xb21564c5346860aa
node      67604            iojs   10u   KQUEUE                                                    count=0, state=0xa
node      67604            iojs   11      PIPE 0x6794b0b6d284fc86       16384                     ->0x4a6eb136c24c663
node      67604            iojs   12      PIPE  0x4a6eb136c24c663       16384                     ->0x6794b0b6d284fc86
node      67604            iojs   13      PIPE 0x89ac3d2a53392ed2       16384                     ->0x271a7292c6eae65a
node      67604            iojs   14      PIPE 0x271a7292c6eae65a       16384                     ->0x89ac3d2a53392ed2
node      67604            iojs   15u   KQUEUE                                                    count=0, state=0xa
node      67604            iojs   16      PIPE 0xff9e2404b52c5ea4       16384                     ->0x975bed42fd0f11ec
node      67604            iojs   17      PIPE 0x975bed42fd0f11ec       16384                     ->0xff9e2404b52c5ea4
node      67604            iojs   18      PIPE 0x450677107be1eb53       16384                     ->0xae42403685cb73b4
node      67604            iojs   19      PIPE 0xae42403685cb73b4       16384                     ->0x450677107be1eb53
administrator@test-orka-macos11-x64-2 shared
github-actions[bot] commented 1 year ago

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

targos commented 1 year ago

I think this is resolved.