Closed targos closed 1 year ago
3 rebuilds were necessary to make it green:
/cc @nodejs/build-infra
I have no idea where to even begin with this.
@AshCripps ?
I wonder if we are running out of memory. Does the failure occur more often on any particular machine/type of machine?
hmm, I guess it's just the one release machine.
Seems like we actually have two. From history it appears it fails on the macstadium machines and is fine on the nearform one.
I see from the inventory:https://github.com/nodejs/build/blob/main/ansible/inventory.yml
That we have server_jobs set to 6 on the macstadium machines. Maybe that is too many
- macstadium:
macos11.0-arm64-1:
ansible_python_interpreter: /usr/bin/python3
ip: 207.254.38.74
user: administrator
remote_env:
PATH: /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin
server_jobs: 6
We've got two things to try I guess:
make
/ sh
commands to try and understand exactly what's failing (I have been assuming that it's a bad memory address that it's talking about, but it's just about possible it's a network address of some sort too.FWIW I've taken the macstadium macos11 release machine offline from the release ci for now to unblock the Node.js 18.8.0 release.
Reducing the number of jobs seems like a reasonable thing to try -- will do that tomorrow along with all the other CI maintenance.
I dropped JOBS down to 3 and the problem still occurs: https://ci-release.nodejs.org/job/iojs+release/8709/nodes=osx11-release-pkg/
Dropped JOBS down to 1: https://ci-release.nodejs.org/job/iojs+release/8710/nodes=osx11-release-pkg/
Dropped JOBS down to 1: https://ci-release.nodejs.org/job/iojs+release/8710/nodes=osx11-release-pkg/
Well that one errored in a different way,
12:24:13 make[1]: *** [node] Bus error: 10
which we've seen on the nearform macos 11 machine. So far we've only seen the Bad address
error on the macstadium one.
Trying again: https://ci-release.nodejs.org/job/iojs+release/8711/nodes=osx11-release-pkg/console
Four builds in a row failed with the bus error. Taken release-macstadium-macos11.0-arm64-1 offline again.
FWIW AFAIK we are not seeing these problems on the test CI. The main differences on the release CI compared to the test CI are:
We're still seeing the bus error quite often on the NearForm hosted machine. (The macstadium one remains offline because we couldn't get a clean build on it whereas the NearForm one sometimes succeeds.)
I don't really know much about macs (I don't use them). @AshCripps Any ideas?
The builds also have this in the logs
make: INTERNAL: Exiting with 1 jobserver tokens available; should be 3!
which after googling points to a bug in make that shouldve been fixed - can we easily update make? is it as simple as running ansible? (its been a while). Actually saying that I believe make is provided by xcode so that might not be an easy fix.
EDIT: the bug in question is around parallelisation.
I've rebooted test-orka-macos11-x64-2 and this has recovered a small amount of space.
Before
administrator@test-orka-macos11-x64-2 ~ % df -h
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk2s5s1 90Gi 14Gi 224Mi 99% 553788 941116212 0% /
devfs 189Ki 189Ki 0Bi 100% 653 0 100% /dev
/dev/disk2s4 90Gi 1.0Gi 224Mi 83% 2 941669998 0% /System/Volumes/VM
/dev/disk2s2 90Gi 305Mi 224Mi 58% 1038 941668962 0% /System/Volumes/Preboot
/dev/disk2s6 90Gi 496Ki 224Mi 1% 18 941669982 0% /System/Volumes/Update
/dev/disk2s1 90Gi 73Gi 224Mi 100% 427252 941242748 0% /System/Volumes/Data
map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /System/Volumes/Data/home
administrator@test-orka-macos11-x64-2 ~ %
After
administrator@test-orka-macos11-x64-2 ~ % df -h
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk2s5s1 90Gi 14Gi 1.8Gi 89% 553788 941116212 0% /
devfs 188Ki 188Ki 0Bi 100% 650 0 100% /dev
/dev/disk2s4 90Gi 1.0Mi 1.8Gi 1% 1 941669999 0% /System/Volumes/VM
/dev/disk2s2 90Gi 305Mi 1.8Gi 15% 1038 941668962 0% /System/Volumes/Preboot
/dev/disk2s6 90Gi 504Ki 1.8Gi 1% 17 941669983 0% /System/Volumes/Update
/dev/disk2s1 90Gi 73Gi 1.8Gi 98% 426524 941243476 0% /System/Volumes/Data
map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /System/Volumes/Data/home
administrator@test-orka-macos11-x64-2 ~ %
Prior to rebooting lsof showed some open files related to node even though no build was in progress:
administrator@test-orka-macos11-x64-2 ~ % sudo /usr/sbin/lsof | grep node
opendirec 110 root txt REG 1,7 32768 462548 /private/var/db/dslocal/nodes/Default/sqlindex-shm
opendirec 110 root 3u REG 1,7 4096 462544 /private/var/db/dslocal/nodes/Default/sqlindex
opendirec 110 root 4u REG 1,7 1145392 462547 /private/var/db/dslocal/nodes/Default/sqlindex-wal
opendirec 110 root 5u REG 1,7 32768 462548 /private/var/db/dslocal/nodes/Default/sqlindex-shm
node 12667 iojs cwd DIR 1,7 64 60221100 /Users/iojs/node-tmp/.tmp.2647
node 12667 iojs txt REG 1,7 86002784 60207676 /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node
node 12667 iojs txt REG 1,7 36716 539642 /Library/Preferences/Logging/.plist-cache.pTbLKkdZ
node 12667 iojs txt REG 1,7 2547856 1152921500312767024 /usr/lib/dyld
node 12667 iojs txt REG 1,7 32214064 1152921500312778383 /usr/share/icu/icudt66l.dat
node 12667 iojs 0u unix 0x2d58a1812c2645ad 0t0 ->(none)
node 12667 iojs 1u unix 0x2d58a1812c2648cd 0t0 ->(none)
node 12667 iojs 2u unix 0x2d58a1812c264a5d 0t0 ->(none)
node 12667 iojs 3u KQUEUE count=0, state=0xa
node 12667 iojs 4 PIPE 0xfdc1cb00a1bdb1e5 16384 ->0xe408e968383444e0
node 12667 iojs 5 PIPE 0xe408e968383444e0 16384 ->0xfdc1cb00a1bdb1e5
node 12667 iojs 6 PIPE 0x6951a4397517a8b0 16384 ->0xe91cae28f1470497
node 12667 iojs 7 PIPE 0xe91cae28f1470497 16384 ->0x6951a4397517a8b0
node 12667 iojs 8 PIPE 0x4ce1eb0c38a6898a 16384 ->0x9cc0d6f06f59f5fc
node 12667 iojs 9 PIPE 0x9cc0d6f06f59f5fc 16384 ->0x4ce1eb0c38a6898a
node 12667 iojs 10u KQUEUE count=0, state=0xa
node 12667 iojs 11 PIPE 0x1533e3436f30338f 16384 ->0x8c4000d5792a2ea0
node 12667 iojs 12 PIPE 0x8c4000d5792a2ea0 16384 ->0x1533e3436f30338f
node 12667 iojs 13 PIPE 0x88bc6467008851b3 16384 ->0xd788ade8faad15dd
node 12667 iojs 14 PIPE 0xd788ade8faad15dd 16384 ->0x88bc6467008851b3
node 12667 iojs 15u KQUEUE count=0, state=0xa
node 12667 iojs 16 PIPE 0xabe33fac0c7b60fa 16384 ->0xff4a3780d944289e
node 12667 iojs 17 PIPE 0xff4a3780d944289e 16384 ->0xabe33fac0c7b60fa
node 12667 iojs 18 PIPE 0xb7c54ce667e9fd36 16384 ->0x4e38bd1005aadd8
node 12667 iojs 19 PIPE 0x4e38bd1005aadd8 16384 ->0xb7c54ce667e9fd36
node 67604 iojs cwd DIR 1,7 64 60176964 /Users/iojs/node-tmp/.tmp.2647
node 67604 iojs txt REG 1,7 86002784 60163530 /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx11-x64/out/Release/node
node 67604 iojs txt REG 1,7 36716 539642 /Library/Preferences/Logging/.plist-cache.pTbLKkdZ
node 67604 iojs txt REG 1,7 32214064 1152921500312778383 /usr/share/icu/icudt66l.dat
node 67604 iojs txt REG 1,7 2547856 1152921500312767024 /usr/lib/dyld
node 67604 iojs 0u unix 0x2d58a1812c262415 0t0 ->(none)
node 67604 iojs 1u unix 0x2d58a1812c262735 0t0 ->(none)
node 67604 iojs 2u unix 0x2d58a1812c2628c5 0t0 ->(none)
node 67604 iojs 3u KQUEUE count=0, state=0xa
node 67604 iojs 4 PIPE 0xdb8970e1d0801fa1 16384 ->0x3f56b88f93efed8b
node 67604 iojs 5 PIPE 0x3f56b88f93efed8b 16384 ->0xdb8970e1d0801fa1
node 67604 iojs 6 PIPE 0x513f5aa52f4d2787 16384 ->0xe8f3e090fd16d6
node 67604 iojs 7 PIPE 0xe8f3e090fd16d6 16384 ->0x513f5aa52f4d2787
node 67604 iojs 8 PIPE 0xb21564c5346860aa 16384 ->0x1e492b8bb882d476
node 67604 iojs 9 PIPE 0x1e492b8bb882d476 16384 ->0xb21564c5346860aa
node 67604 iojs 10u KQUEUE count=0, state=0xa
node 67604 iojs 11 PIPE 0x6794b0b6d284fc86 16384 ->0x4a6eb136c24c663
node 67604 iojs 12 PIPE 0x4a6eb136c24c663 16384 ->0x6794b0b6d284fc86
node 67604 iojs 13 PIPE 0x89ac3d2a53392ed2 16384 ->0x271a7292c6eae65a
node 67604 iojs 14 PIPE 0x271a7292c6eae65a 16384 ->0x89ac3d2a53392ed2
node 67604 iojs 15u KQUEUE count=0, state=0xa
node 67604 iojs 16 PIPE 0xff9e2404b52c5ea4 16384 ->0x975bed42fd0f11ec
node 67604 iojs 17 PIPE 0x975bed42fd0f11ec 16384 ->0xff9e2404b52c5ea4
node 67604 iojs 18 PIPE 0x450677107be1eb53 16384 ->0xae42403685cb73b4
node 67604 iojs 19 PIPE 0xae42403685cb73b4 16384 ->0x450677107be1eb53
administrator@test-orka-macos11-x64-2 shared
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.
I think this is resolved.
Example: https://ci-release.nodejs.org/job/iojs+release/8645/nodes=osx11-release-tar/console