Closed osresearch closed 2 years ago
I'm currently running into this trying to build one of the x230 pull requests. -j isn't propagated to the subdirectories, and setting MAKE_JOBS breaks bootstrapping of the toolchain. Without anything set everything runs as -j1, though, which takes ages.
At least some of that seems to have been introduced in the past 6 months - I remember having some initial issues to make it use parallel builds back in summer, but once I had that working it took about 5 minutes for a full build (including bootstrapping the tool chain). So far I've been sitting on this now for about 1 hour, and still didn't get past the tool chain bootstrapping.
@bwachter documentation has been updated here, rendered here, to state that CPUS=YY
should be specified on the make BOARD=XXX
Heads board build statement.
I'm still a bit confused here, since the main Makefile of Heads is taking the ouptut of nproc if not specified and populates CPUS variable which is then passed along if not defined on make initial call.
That hack was implemented, because some modules won't like to have -j forced, where others play along. The idea here was to pass CPUS down in other makefiles (modules/*) where they play fair, and go single threaded for when modules don't play well.
@osresearch?
The issue with CPUS=xxx
on the make command line is that it does not spawn multiple top-level jobs. So while the individual components might be built in parallel, there is no parallelism across the modules. On a rebuild after a make real.clean
(so the cross compilers are intact) only one module is built at a time. On my build machine this makes the difference between a 90 second rebuild and a much longer process process.
make real.clean && time make V=1 CPUS=128
...
real 30m2.657s
user 25m18.422s
sys 7m0.333s
versus
make real.clean && time make V=1 -j128
...
real 1m27.378s
user 30m26.307s
sys 6m12.245s
@osresearch The problem with -j24 is that there is no locking between interdependent tasks.
Here, I launch the same build with CPUS=24(functional) and -j24(failing badly with files missing) over CircleCI.
The line rm -rf build/x230-hotp-maximized/* build/log/* && make -j24 V=1 BOARD=x230-hotp-maximized || touch /tmp/failed_build
in CircleCI permits all logs created to be outputed in next task on CI, which otherwise would not be readable since -j24 ruins the output.
So the next task picks up created logs and outputs them on CI with delimitators: if [[ -f /tmp/failed_build ]]; then find ./build/ -name "*.log" -type f -mmin -1|while read log; do echo ""; echo '==>' "$log" '<=='; echo ""; cat $log;done; exit 1;else echo "Not failing. Continuing..."; fi
The resulting logs are concatenated here with ==> and <== separators.
@osresearch : i'm redoing a build without decompressing your cache file, which failed on CircleCI for your latest commit, since host binaries were not found on fed config.status (sed, gawk and other host binaries were not found on provided paths).
A CI build is happening here
@osresearch seems like the missing culprit was in commenting MAKE_JOBS under global Mafefile on PR #984 (and deleting the configure cache which doesn't work under CircleCI).
Then a weird race condition happens only under newt module build, which can be hacked by forcing that module to be built only with one job. Will do a replacement PR once working around new CircleCI limitations pass.
@osresearch @bwachter : I was successful into fixing partly this issue in that commit: https://github.com/osresearch/heads/pull/1015/commits/5e4309c1e19b58cb6c2ca9cddd11893c83afaa71
Where MAKE_JOBS was commented out into the main Makefile. Improvements welcome.
Tag me to reopen. CircleCI builds in parallel with 36 cores now under 45 minutes on clean checkout.
There are some components that do not correctly depend on the cross compiler, so a make for the clean checkout fails. Easy fixes are the various tools in
$(COREBOOT_UTIL_DIR)
likecbmem
,superiotool
andinteltool
, which need to wait until the cross compiler is available.