Add GNU make jobserver client support

stefanb2 commented 8 years ago

As long as ninja is the only build execution tool, the current ninja -jN implementation works fine.

But when you try to convert parts of an existing recursive GNU make based SW build system to ninja, then you have the following situation:

top-level GNU Make (with -jX, acts as job server)
M instances of GNU make (with -j, act as job server clients)
N instances of ninja (don't know anything about job server)

Simply calling `ninja -jY' isn't enough, because then the ninja instances will try to run Y*N jobs, plus the X jobs from the GNU make instances, causing the build host to overload. Relying on -lZ to fix this issue is sub-optimal, because load average is sometimes too slow to reflect the actual situation on the build host.

It would be nice if GNU make jobserver client support could be added to Ninja. Then the N ninja instances would cooperate with the M GNU make instances and on the build host only X jobs would be executed at one time.

stefanb2 commented 8 years ago

I have tested this implementation over the last few weeks in two different recursive GNU make based build systems that originally had M+1 GNU make instances:

use case A: top-level GNU make, 1 ninja instance, M-1 GNU make instances
use case B: top-level GNU make, N ninja instances, M-N GNU make instances

FYI: google/kati was used to convert existing single makefile GNU make parts to Ninja build file.

nico commented 8 years ago

Thanks for the patch!

We've discussed this on the mailing list a few times (e.g. here https://groups.google.com/forum/#!searchin/ninja-build/jobserver/ninja-build/PUlsr7-jpI0/Ga19TOg1c14J). Ninja works best if it knows about the whole build. Now that kati exists, one can convert those to ninja files and munge them up to have a single build manifest (that's Android's transition strategy from Make to Ninja -- they use kati to get everything converted to Ninja files, and then they're incrementally converting directories to use something-not-make -- and then kati produces parts of their Ninja files and the new thing produces parts of the ninja files.)

Is your use case that you have recursive makefiles?

stefanb2 commented 8 years ago

I could have guessed that this has been discussed before, because I'm surely not the first person facing such a situation.

Here are my reasons for requesting this:

recursion: kati currently can't translate recursive GNU make based build systems, like Linux kernel kbuild. IMHO a major effort and unfortunately I can't wait for kati to provide this, hence the such sub-component builds will have to stay with GNU make for the time being.
missing features: kati currently can't translate fully modularized GNU make based build systems, i.e. where each component is built in isolation and in a separate build directory, so that all ninja.build files could be merged into a single one. While IMHO not such a major issue as (1) it is much simpler to replace the lowest-level $(MAKE) recipe with a kati/ninja recipe. Parsing + merging might also introduce unnecessary build delay (needs to be seen what would happen in real life)
technical barriers: e.g. sub-component builds that run behind a "chroot firewall". Even if everything moves to Ninja, you would still need 1 (main) + N (one for each chroot) ninja instances that need to cooperate. Ninja doesn't offer anything like that.
too simple workarounds: AOSP makeparallel + kati/ninja runs all $(MAKE) instances hard-coded with "make -j4" with no cooperation between any of the GNU make instances. That is only acceptable if you have no or only a few or small $(MAKE) invocations from the ninja.build file.
organizational barriers: even if it might be possible to use kati/ninja to convert an existing GNU make base sub-part of the system, you might not be allowed to do so. Such sub-component builds need to stay with GNU make.
You ask: why not split the build up and run them as separate builds? Goto (5)...

IMHO my patch provides a good solution, considering

how small the required changes to ninja are,
that the default behaviour is completely unchanged, and
that this will make the life easier for many other ninja users which face the same issues

ghost commented 8 years ago

wow +1

maximuska commented 8 years ago

Another possible reason for having jobserver in ninja seems to be LTO support in gcc. -flto=jobserver tells gcc to use GNU make's job server mode to determine the number of parallel jobs. The alternative is to spawn a fixed number of jobs with e.g., -flto=16.

fabio-porcedda commented 7 years ago

I would like too have this feature merged, i simply cannot convert all projects to ninja-build because i'm not allowed to do that.

@stefanb2 Thanks a lot for your work

dublet commented 7 years ago

Can I just add my voice to the list of people who would like this to be merged? At my company we also use a nested build system, and with this patch it makes ninja behave very nicely indeed. We're not in the position to make ninja build everything yet.

glandium commented 7 years ago

Please note that from a quick glance at the commit on @stefanb2's branch, I expect it doesn't work on Windows, where Make uses a different setup.

stefanb2 commented 7 years ago

@glandium correct, in the Windows build a no-op token pool implementation is included. But I fail to see why this would be a relevant reason for rejecting this pull request.

That said, I'm pretty sure that it would be possible to provide an update that implements the token protocol used by Windows GNU make 4.x. Probably tokenpool-gnu-make.cc could be refactored into system agnostic and UNIX-dependent bits.

nox commented 7 years ago

This would be really useful too when invoking ninja as part of another build tool, such as cargo.

comicfans commented 7 years ago

This should be very useful for super-project build, in our large code base, due to different compiler/environment config, we can not include all projects in one single ninja build, so we have 1 top-level and N sub-projects built by ninja , this config trigger Y*N problem.

xqms commented 6 years ago

+1 - this is highly interesting for parallel builds with catkin_tools (https://catkin-tools.readthedocs.io/en/latest/). A catkin_tools workspace consists of separate CMake projects which are built in isolation. To control the CPU consumption of parallel make runs, catkin_tools contains a GNU Make jobserver implementation. In this way, the make jobserver is starting to become a standard "protocol" for controlling resource consumption of parallel builds.

Note that in the catkin_tools scenario, it is not easy to merge the individual build.ninja files into a hierarchy of subninja files, because

Targets/individual rules will clash - would need CMake changes to keep them apart.
We would need some way of encoding inter-package dependencies (build this subninja before that).
catkin_tools needs to perform additional installation steps after a package has been built.
Also, catkin_tools provides many nice features which would be defeated by a merged build (package-level monitoring, build output grouped by packages, ...).

yann-morin-1998 commented 6 years ago

@nico I would like to add my voice to having support for GNu make job-server support in ninja.

Meta-buildsystems like OpenEmbedded (Yocto), OpenWRT, Buildroot and a lot of others, are tasked with generating systems by building a lot of various packages from various sources, all using various buildsystems. I'll mostly use Buildroot as an example, as I'm very familiar with it, but the following is in principle applicable to all the buildsystems as well.

Such build systems will typically have this sequence per package they build:

download sources of a package
extract the sources
configure the package
build the package
install it in a staging location

And they will repeat that sequence for each and all packages that are needed to build the target system:

build busybox
build coreutils
build foo
build bar
etc...

Once all packages have been built and installed in the staging location, a system image (e.g. a bootloader + Linux Kernel + root filesystem for example) is generated from that staging location. That system image can the be directly flashed onto a device.

Now, that was the quick overview.

Since a system can be made of a lot of packages, we want to build as many packages in parallel (respecting a depndency chain, of course). But then for each package, we also want to take advantage of parallel compilation, in case no other package is being built at the same time.

So, if we have a 8-core machine, we would want to build up to 8 jobs in parallel, which means we have to distribute those jobs to the various packages that need to be built at some point in time, so that we maximie the number of jobs, but do not over-shoot the 8-CPU limit.

For example, if 8 ninja-based packages are built in parallel and they do not share a job-server, they will each be building 8 jobs, which is a total of 64 parallel jobs. On the other hand, limiting the ninja builds to a single job will be a waste of time when only a single package is built at some point in time (e.g. becasue the other ones have already finished building, or because the dependency chain needs that one package before continuing).

And as has been already explained in previous posts in this thread, not every package is based on ninja, and not every package is even conceivably switchable to ninja. And even if every packages were using ninja, we can't simply aggregate all the ninja definitions to have a super-build, because eveything would end up clashing with everything else... So we still need to be able to cooperate with the rest of the world, especially when that rest of the world has been established for decades now... ;-)

Thanks for reading so far! :-)

ihnorton commented 6 years ago

+1. We also face this issue of Y*N ninjas while using CMake ExternalProject functionality.

jcfr commented 6 years ago

In the mean time, you can find binaries with GNU make jobserver client support here: https://github.com/dockbuild/ninja-jobserver

nox commented 6 years ago

@nico What can make you reconsider your decision here? Ninja can not decently be used as part of a bigger build system thanks to the absence of job server support.

nico commented 6 years ago

I agree it would be good to make it easy to compose multiple projects. But I think that should be on the generator level, so that in the end you end up with a single build.ninja that builds all your stuff (see also https://github.com/ninja-build/ninja/issues/1133#issuecomment-325883154).

Like @xqms says above, currently there isn't a good way for generators to do this if multiple projects use different metabuild systems, due to target names clashing. (If they're all using one generator, this could arguably be done at the generator level.) So I think investigating that direction is more interesting long term.

However, I grant that there's lots of demand for this (thanks to all of you who chimed in), and I'm sympathetic especially to the "organizational barriers" bit above, even if this takes ninja in a direction I disagree with. So I think I'm open to merging #1140 in principle now. Has everyone who voiced support actually tested that #1140 does what you want (e.g. with the builds mentioned by @jcfr two comments up)? For people who said +1 upon reading the description, please do check that the implementation works the way you want and report back.

wouterklouwen-youview commented 6 years ago

@nico The organization I work for, and whose build system I'm trying to improve, has been successfully using this patch for a year. We have a very large code base with 1M+ LOC divided up into 250+ projects. Each project is currently built with ninja, with a global GNU make instance orchestrating it. Combining every dev and CI instance we probably do up to hundred builds a day. The patch has been performing faultlessly and without it, ninja would not be able to fit into our build system.

In fact, the only bugs we encountered have been in GNU make! http://savannah.gnu.org/bugs/?51159

ihnorton commented 6 years ago

Only lightly-tested here so far, but seems to work well (macOS 10.12, large CMake project with ~30 deps).

currently there isn't a good way for generators to do this if multiple projects use different metabuild systems, due to target names clashing

Could subninja support an optional namespace argument? (this would only be relevant to the all-ninja use-case, of course, but still potentially useful)

nox commented 6 years ago

@nico Ping?

stefanb2 commented 6 years ago

@glandium a Win32 implementation for GNUmakeTokenPool has been added to the PR.

apenwarr commented 6 years ago

FWIW, I added support for the make jobserver in my own incompatible build system, https://github.com/apenwarr/redo, and a) it was really easy and elegant, and b) it works great, and c) people don't have to change anything about the rest of their build infrastructure.

It's all upside, no downside. I'm sure it would be very nice to have One True Build System across an entire megaproject, and I applaud people trying to make that happen, but even if it does someday happen, this simple jobserver support will not make anything worse.

nox commented 6 years ago

@nico For our use case in Rust land, we have Cargo invoke ninja in a build script when building some crate. There is literally no one who want to make ninja be the top-level build tool there, hence why we need job server support.

wouterklouwen-youview commented 6 years ago

In our case we are building over 100 open source packages, including autoconf and automake. It seems unlikely they'll be converted to build with ninja.

noseglasses commented 5 years ago

I am facing the same problems that all the others already stated. Without job server client support, Ninja is simply not an option for the large project my company is working on (heterogenous - CMake, autotools, etc. - with many third party subprojects). That's pretty sad. It would be so cool to benefit from Ninja's lightning speed in re-builds.

In our case, the missing job server client capability is all that needs to be added Ninja. Wrapped by a dummy GNU/make process that simply supplies the job server, Ninja could serve as the actual top level build system, thus allowing for much faster rebuilds. Of course it would be even nicer if Ninja would be able to act as job server.

valpackett commented 5 years ago

++ In the LDC D compiler, ninja test causes a ninja → ctest → ninja chain which hits this problem

jcfr commented 5 years ago

Without job server client support, Ninja is simply not an option for the large project my company is working on (heterogenous - CMake, autotools, etc. - with many third party subprojects). That's pretty sad.

Considering using the binary at https://github.com/Kitware/ninja/releases/tag/v1.8.2.g81279.kitware.dyndep-1.jobserver-1

or you could pip install ninja, it also installs the version with jobserver support.

nox commented 5 years ago

@nico How can we make you change your mind about including this to Ninja?

simonfxr commented 5 years ago

Any Progress? Some way of coordinating concurrent ninja builds would be really useful, e.g. using multiple different configurations generated by CMake. I have a project where I have 16 different configurations for testing possible combinations of build flags. Currently I use a xargs -P to build in parallel, it works, but it's ugly and not cross platform.

avikivity commented 5 years ago

I have a similar use case with multiple cmake confifurations that could use a jobserver.

jcfr commented 5 years ago

In the mean time, you can download the release packages from https://github.com/kitware/ninja-build, they include Fortran and job server support.

You could also even install the ninja python wheels.

Hth Jc

On Fri, Feb 1, 2019, 2:48 PM Avi Kivity <notifications@github.com wrote:

I have a similar use case with multiple cmake confifurations that could use a jobserver.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ninja-build/ninja/issues/1139#issuecomment-459746498, or mute the thread https://github.com/notifications/unsubscribe-auth/AANXoykqpXk-_KjVRT1lE1alYqeKtL7Dks5vJFO3gaJpZM4IQzfw .

bonzini commented 5 years ago

My usecase is a bit different, as I have a test driver that runs hundreds of tests in parallel using the jobserver; for that I actually would need a jobserver implemented in Ninja itself, but the client is a prerequisite and the work needed to implement the server is trivial compared to the client.

avikivity commented 5 years ago

@bonzini you could run ninja from a one-line makefile (I sort of have the same plans in order to build debug and release in parallel)

stefanb2 commented 5 years ago

FYI: I do have a proposal for a jobserver implementation in Ninja but of course it doesn't make sense to submit a PR until the current one has been merged.

bonzini commented 5 years ago

@avikivity yeah I have a Makefile that's way more than one line, since I'm only slowly converting from Make to meson/ninja—which is what brought me to this issue. But I'd like to get rid of it sooner or later, of course.

avikivity commented 5 years ago

Looks like it will be later rather than sooner :(

nox commented 5 years ago

@nico Ping.

nox commented 5 years ago

@jhasse Ping, given you have the last commit on master.

jhasse commented 5 years ago

@nox Is there anything you want me to comment on?

See #1140 for a possible implementation.

nox commented 5 years ago

I guess I want an update on that PR, given there has been code changes since your last comment which was in December 2018.

dothebart commented 5 years ago

since all real arguments have been named, PRs are there for more than a year, I'd say this doesn't shed a good light on ninja-build as progressive project that follows user demand, has a clear discussion philosopy about whats good or bad for the way ahead.

Please consider getting this fixed.

avikivity commented 5 years ago

Note that the compiler can benefit from jobserver support: gcc -flto will run as many jobs in parallel as the jobserver will allow it. Without it, one must either overcommit the build host, or underutilize its resources.

     You can also specify '-flto=jobserver' to use GNU make's job server
     mode to determine the number of parallel jobs.  This is useful when
     the Makefile calling GCC is already executing in parallel.  You
     must prepend a '+' to the command recipe in the parent Makefile for
     this to work.  This option likely only works if 'MAKE' is GNU make.

jhasse commented 5 years ago

@dothebart The PR is still being worked on. Not sure what you want us to "fix".

@avikivity This sounds awesome! LTO is one of the worst memory killers and we should keep -fto=jobserver in mind and test if it works with Ninja's potential jobserver support.

edit: Just noticed that this was brought up in https://github.com/ninja-build/ninja/issues/1139#issuecomment-238083334 already :)

avikivity commented 5 years ago

And I see that I upvoted that comment long ago :)

ubitux commented 4 years ago

This code that has been merged in the ninja deployed by pip is causing me issues. Basically, if I use make -jN, the sub-project using meson stop being multi-threaded (because it's using the jobserver codepath). This can be easily reproduced with the 2 following files:

Makefile:

VENV = venv
ACTIVATE = $(VENV)/bin/activate

all: $(VENV) meson.build
    (. $(ACTIVATE) && meson setup builddir && meson compile -v -C builddir)

# Python virtual environment with meson and ninja
$(VENV):
    python -m venv $@
    (. $(ACTIVATE) && pip install meson ninja)

# Generate meson.build and sources files for testing purpose (please ignore the following lines)
NB_SOURCES = 20
SOURCES = $(addsuffix .c,$(addprefix src,$(shell seq -w $(NB_SOURCES))))
src%.c:
    sed s/func/func$(@:.c=)/ tpl.c > $@
meson.build: $(VENV) $(SOURCES)
    echo "project('ninja-makopts', 'c')" > $@
    echo "library('x')" >> $@
    (. $(ACTIVATE) && meson rewrite target x add $(SOURCES))

clean:
    $(RM) meson.build
    $(RM) src*.c
    $(RM) -r builddir
    $(RM) -r $(VENV)

.PHONY: all clean

tpl.c (random stuff, slow to compile to observe the behavior -- add some i in the last #define line to make it slower) :

#define a "xxxxxxxxxxx"
#define b a a a a a a a
#define c b b b b b b b
#define d c c c c c c c
#define e d d d d d d d
#define f e e e e e e e
#define g f f f f f f f
#define h g g g g g g g
#define i h h h h h h h
#define j i
void func(char *z){*z=i[0];}

Basically the Makefile creates a Python virtualenv and wraps a call to meson which call ninja (default backend).

If I use make, the compilation is multiprocess
If I use make -jN, compilation is singleprocess

This is a simplified test case extracted from a real project, where we do want the make -jN because it typically download dependencies and that kind of stuff, and we want that in parallel.

One workaround for now is to do MAKEFLAGS= meson compile .... It's still annoying though that -jN has the exact opposite expectations.

segevfiner commented 4 years ago

You might need to prefix commands with + in the Makefile for them to have access to the jobserver properly.

ubitux commented 4 years ago

You might need to prefix commands with + in the Makefile for them to have access to the jobserver properly.

Where exactly would you put the +? +(. $(ACTIVATE) && meson setup builddir && meson compile -v -C builddir) has no effect, and I can't write +meson.

eli-schwartz commented 4 years ago

It's impossible for your recipe to know whether the ninja is a fork with no server support or not, and therefore whether it should participate in the jobserver (by invoking +ninja ...) or not.

But I'd guess that it is the missing + which causes the problem. No jobs are being handed to the ninja process, but since it supports the jobserver, it thinks that means there are no jobs to give.

tl;dr it's actually worse to have some builds of ninja with jobserver support than to have none of them support the jobserver. It needs to be all or nothing, so you can detect support by checking the version rather than guessing whether you got it from pip, and everyone can properly opt in to it.

... Or hmm, maybe it should use ninja -j jobserver.

jcfr commented 4 years ago

You can detect jobserver support by parsing the output of --version:

$ ninja --version 
1.10.0.git.kitware.jobserver-1

bonzini commented 4 years ago

This is a simplified test case extracted from a real project, where we do want the make -jN because it typically download dependencies and that kind of stuff, and we want that in parallel.

If so, you actually want the missing + to revert to single-process! Because otherwise with make -j$(nproc) you might end up with up to $(nproc)-squared compiler processes.

It's impossible for your recipe to know whether the ninja is a fork with no server support or not, and therefore whether it should participate in the jobserver (by invoking +ninja ...) or not.

Always adding the + does not hurt, and in fact I'd suggest doing it because + also overrides make --output-sync (that is, without + the whole compilation process might be buffered by make!).

The + doesn't go right before ninja. It goes at the beginning of the rule, where you would also put for example a @:

all: $(VENV) meson.build
    +(. $(ACTIVATE) && meson setup builddir && meson compile -v -C builddir)

ninja-build / ninja

Add GNU make jobserver client support #1139