Open stefanb2 opened 8 years ago
I have tested this implementation over the last few weeks in two different recursive GNU make based build systems that originally had M+1 GNU make instances:
FYI: google/kati was used to convert existing single makefile GNU make parts to Ninja build file.
Thanks for the patch!
We've discussed this on the mailing list a few times (e.g. here https://groups.google.com/forum/#!searchin/ninja-build/jobserver/ninja-build/PUlsr7-jpI0/Ga19TOg1c14J). Ninja works best if it knows about the whole build. Now that kati exists, one can convert those to ninja files and munge them up to have a single build manifest (that's Android's transition strategy from Make to Ninja -- they use kati to get everything converted to Ninja files, and then they're incrementally converting directories to use something-not-make -- and then kati produces parts of their Ninja files and the new thing produces parts of the ninja files.)
Is your use case that you have recursive makefiles?
I could have guessed that this has been discussed before, because I'm surely not the first person facing such a situation.
Here are my reasons for requesting this:
IMHO my patch provides a good solution, considering
wow +1
Another possible reason for having jobserver in ninja seems to be LTO support in gcc. -flto=jobserver tells gcc to use GNU make's job server mode to determine the number of parallel jobs. The alternative is to spawn a fixed number of jobs with e.g., -flto=16.
I would like too have this feature merged, i simply cannot convert all projects to ninja-build because i'm not allowed to do that.
@stefanb2 Thanks a lot for your work
Can I just add my voice to the list of people who would like this to be merged? At my company we also use a nested build system, and with this patch it makes ninja behave very nicely indeed. We're not in the position to make ninja build everything yet.
Please note that from a quick glance at the commit on @stefanb2's branch, I expect it doesn't work on Windows, where Make uses a different setup.
@glandium correct, in the Windows build a no-op token pool implementation is included. But I fail to see why this would be a relevant reason for rejecting this pull request.
That said, I'm pretty sure that it would be possible to provide an update that implements the token protocol used by Windows GNU make 4.x. Probably tokenpool-gnu-make.cc
could be refactored into system agnostic and UNIX-dependent bits.
This would be really useful too when invoking ninja as part of another build tool, such as cargo.
This should be very useful for super-project build, in our large code base, due to different compiler/environment config, we can not include all projects in one single ninja build, so we have 1 top-level and N sub-projects built by ninja , this config trigger Y*N problem.
+1 - this is highly interesting for parallel builds with catkin_tools
(https://catkin-tools.readthedocs.io/en/latest/). A catkin_tools workspace consists of separate CMake projects which are built in isolation. To control the CPU consumption of parallel make runs, catkin_tools contains a GNU Make jobserver implementation.
In this way, the make jobserver is starting to become a standard "protocol" for controlling resource consumption of parallel builds.
Note that in the catkin_tools scenario, it is not easy to merge the individual build.ninja files into a hierarchy of subninja files, because
@nico I would like to add my voice to having support for GNu make job-server support in ninja.
Meta-buildsystems like OpenEmbedded (Yocto), OpenWRT, Buildroot and a lot of others, are tasked with generating systems by building a lot of various packages from various sources, all using various buildsystems. I'll mostly use Buildroot as an example, as I'm very familiar with it, but the following is in principle applicable to all the buildsystems as well.
Such build systems will typically have this sequence per package they build:
And they will repeat that sequence for each and all packages that are needed to build the target system:
Once all packages have been built and installed in the staging location, a system image (e.g. a bootloader + Linux Kernel + root filesystem for example) is generated from that staging location. That system image can the be directly flashed onto a device.
Now, that was the quick overview.
Since a system can be made of a lot of packages, we want to build as many packages in parallel (respecting a depndency chain, of course). But then for each package, we also want to take advantage of parallel compilation, in case no other package is being built at the same time.
So, if we have a 8-core machine, we would want to build up to 8 jobs in parallel, which means we have to distribute those jobs to the various packages that need to be built at some point in time, so that we maximie the number of jobs, but do not over-shoot the 8-CPU limit.
For example, if 8 ninja-based packages are built in parallel and they do not share a job-server, they will each be building 8 jobs, which is a total of 64 parallel jobs. On the other hand, limiting the ninja builds to a single job will be a waste of time when only a single package is built at some point in time (e.g. becasue the other ones have already finished building, or because the dependency chain needs that one package before continuing).
And as has been already explained in previous posts in this thread, not every package is based on ninja, and not every package is even conceivably switchable to ninja. And even if every packages were using ninja, we can't simply aggregate all the ninja definitions to have a super-build, because eveything would end up clashing with everything else... So we still need to be able to cooperate with the rest of the world, especially when that rest of the world has been established for decades now... ;-)
Thanks for reading so far! :-)
+1. We also face this issue of Y*N ninjas while using CMake ExternalProject
functionality.
In the mean time, you can find binaries with GNU make jobserver client support here: https://github.com/dockbuild/ninja-jobserver
@nico What can make you reconsider your decision here? Ninja can not decently be used as part of a bigger build system thanks to the absence of job server support.
I agree it would be good to make it easy to compose multiple projects. But I think that should be on the generator level, so that in the end you end up with a single build.ninja that builds all your stuff (see also https://github.com/ninja-build/ninja/issues/1133#issuecomment-325883154).
Like @xqms says above, currently there isn't a good way for generators to do this if multiple projects use different metabuild systems, due to target names clashing. (If they're all using one generator, this could arguably be done at the generator level.) So I think investigating that direction is more interesting long term.
However, I grant that there's lots of demand for this (thanks to all of you who chimed in), and I'm sympathetic especially to the "organizational barriers" bit above, even if this takes ninja in a direction I disagree with. So I think I'm open to merging #1140 in principle now. Has everyone who voiced support actually tested that #1140 does what you want (e.g. with the builds mentioned by @jcfr two comments up)? For people who said +1 upon reading the description, please do check that the implementation works the way you want and report back.
@nico The organization I work for, and whose build system I'm trying to improve, has been successfully using this patch for a year. We have a very large code base with 1M+ LOC divided up into 250+ projects. Each project is currently built with ninja, with a global GNU make instance orchestrating it. Combining every dev and CI instance we probably do up to hundred builds a day. The patch has been performing faultlessly and without it, ninja would not be able to fit into our build system.
In fact, the only bugs we encountered have been in GNU make! http://savannah.gnu.org/bugs/?51159
Only lightly-tested here so far, but seems to work well (macOS 10.12, large CMake project with ~30 deps).
currently there isn't a good way for generators to do this if multiple projects use different metabuild systems, due to target names clashing
Could subninja
support an optional namespace argument? (this would only be relevant to the all-ninja use-case, of course, but still potentially useful)
@nico Ping?
@glandium a Win32 implementation for GNUmakeTokenPool has been added to the PR.
FWIW, I added support for the make jobserver in my own incompatible build system, https://github.com/apenwarr/redo, and a) it was really easy and elegant, and b) it works great, and c) people don't have to change anything about the rest of their build infrastructure.
It's all upside, no downside. I'm sure it would be very nice to have One True Build System across an entire megaproject, and I applaud people trying to make that happen, but even if it does someday happen, this simple jobserver support will not make anything worse.
@nico For our use case in Rust land, we have Cargo invoke ninja in a build script when building some crate. There is literally no one who want to make ninja be the top-level build tool there, hence why we need job server support.
In our case we are building over 100 open source packages, including autoconf and automake. It seems unlikely they'll be converted to build with ninja.
I am facing the same problems that all the others already stated. Without job server client support, Ninja is simply not an option for the large project my company is working on (heterogenous - CMake, autotools, etc. - with many third party subprojects). That's pretty sad. It would be so cool to benefit from Ninja's lightning speed in re-builds.
In our case, the missing job server client capability is all that needs to be added Ninja. Wrapped by a dummy GNU/make process that simply supplies the job server, Ninja could serve as the actual top level build system, thus allowing for much faster rebuilds. Of course it would be even nicer if Ninja would be able to act as job server.
++ In the LDC D compiler, ninja test
causes a ninja → ctest → ninja
chain which hits this problem
Without job server client support, Ninja is simply not an option for the large project my company is working on (heterogenous - CMake, autotools, etc. - with many third party subprojects). That's pretty sad.
Considering using the binary at https://github.com/Kitware/ninja/releases/tag/v1.8.2.g81279.kitware.dyndep-1.jobserver-1
or you could pip install ninja
, it also installs the version with jobserver support.
@nico How can we make you change your mind about including this to Ninja?
Any Progress? Some way of coordinating concurrent ninja builds would be really useful, e.g. using multiple different configurations generated by CMake. I have a project where I have 16 different configurations for testing possible combinations of build flags. Currently I use a xargs -P to build in parallel, it works, but it's ugly and not cross platform.
I have a similar use case with multiple cmake confifurations that could use a jobserver.
In the mean time, you can download the release packages from https://github.com/kitware/ninja-build, they include Fortran and job server support.
You could also even install the ninja python wheels.
Hth Jc
On Fri, Feb 1, 2019, 2:48 PM Avi Kivity <notifications@github.com wrote:
I have a similar use case with multiple cmake confifurations that could use a jobserver.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ninja-build/ninja/issues/1139#issuecomment-459746498, or mute the thread https://github.com/notifications/unsubscribe-auth/AANXoykqpXk-_KjVRT1lE1alYqeKtL7Dks5vJFO3gaJpZM4IQzfw .
My usecase is a bit different, as I have a test driver that runs hundreds of tests in parallel using the jobserver; for that I actually would need a jobserver implemented in Ninja itself, but the client is a prerequisite and the work needed to implement the server is trivial compared to the client.
@bonzini you could run ninja from a one-line makefile (I sort of have the same plans in order to build debug and release in parallel)
FYI: I do have a proposal for a jobserver implementation in Ninja but of course it doesn't make sense to submit a PR until the current one has been merged.
@avikivity yeah I have a Makefile that's way more than one line, since I'm only slowly converting from Make to meson/ninja—which is what brought me to this issue. But I'd like to get rid of it sooner or later, of course.
Looks like it will be later rather than sooner :(
@nico Ping.
@jhasse Ping, given you have the last commit on master
.
@nox Is there anything you want me to comment on?
See #1140 for a possible implementation.
I guess I want an update on that PR, given there has been code changes since your last comment which was in December 2018.
since all real arguments have been named, PRs are there for more than a year, I'd say this doesn't shed a good light on ninja-build as progressive project that follows user demand, has a clear discussion philosopy about whats good or bad for the way ahead.
Please consider getting this fixed.
Note that the compiler can benefit from jobserver support: gcc -flto will run as many jobs in parallel as the jobserver will allow it. Without it, one must either overcommit the build host, or underutilize its resources.
You can also specify '-flto=jobserver' to use GNU make's job server
mode to determine the number of parallel jobs. This is useful when
the Makefile calling GCC is already executing in parallel. You
must prepend a '+' to the command recipe in the parent Makefile for
this to work. This option likely only works if 'MAKE' is GNU make.
@dothebart The PR is still being worked on. Not sure what you want us to "fix".
@avikivity This sounds awesome! LTO is one of the worst memory killers and we should keep -fto=jobserver
in mind and test if it works with Ninja's potential jobserver support.
edit: Just noticed that this was brought up in https://github.com/ninja-build/ninja/issues/1139#issuecomment-238083334 already :)
And I see that I upvoted that comment long ago :)
This code that has been merged in the ninja
deployed by pip
is causing me issues. Basically, if I use make -jN
, the sub-project using meson stop being multi-threaded (because it's using the jobserver codepath). This can be easily reproduced with the 2 following files:
Makefile
:
VENV = venv
ACTIVATE = $(VENV)/bin/activate
all: $(VENV) meson.build
(. $(ACTIVATE) && meson setup builddir && meson compile -v -C builddir)
# Python virtual environment with meson and ninja
$(VENV):
python -m venv $@
(. $(ACTIVATE) && pip install meson ninja)
# Generate meson.build and sources files for testing purpose (please ignore the following lines)
NB_SOURCES = 20
SOURCES = $(addsuffix .c,$(addprefix src,$(shell seq -w $(NB_SOURCES))))
src%.c:
sed s/func/func$(@:.c=)/ tpl.c > $@
meson.build: $(VENV) $(SOURCES)
echo "project('ninja-makopts', 'c')" > $@
echo "library('x')" >> $@
(. $(ACTIVATE) && meson rewrite target x add $(SOURCES))
clean:
$(RM) meson.build
$(RM) src*.c
$(RM) -r builddir
$(RM) -r $(VENV)
.PHONY: all clean
tpl.c
(random stuff, slow to compile to observe the behavior -- add some i
in the last #define
line to make it slower) :
#define a "xxxxxxxxxxx"
#define b a a a a a a a
#define c b b b b b b b
#define d c c c c c c c
#define e d d d d d d d
#define f e e e e e e e
#define g f f f f f f f
#define h g g g g g g g
#define i h h h h h h h
#define j i
void func(char *z){*z=i[0];}
Basically the Makefile
creates a Python virtualenv and wraps a call to meson
which call ninja
(default backend).
make
, the compilation is multiprocessmake -jN
, compilation is singleprocessThis is a simplified test case extracted from a real project, where we do want the make -jN
because it typically download dependencies and that kind of stuff, and we want that in parallel.
One workaround for now is to do MAKEFLAGS= meson compile ...
. It's still annoying though that -jN
has the exact opposite expectations.
You might need to prefix commands with +
in the Makefile for them to have access to the jobserver properly.
You might need to prefix commands with
+
in the Makefile for them to have access to the jobserver properly.
Where exactly would you put the +
? +(. $(ACTIVATE) && meson setup builddir && meson compile -v -C builddir)
has no effect, and I can't write +meson
.
It's impossible for your recipe to know whether the ninja is a fork with no server support or not, and therefore whether it should participate in the jobserver (by invoking +ninja ...
) or not.
But I'd guess that it is the missing + which causes the problem. No jobs are being handed to the ninja process, but since it supports the jobserver, it thinks that means there are no jobs to give.
tl;dr it's actually worse to have some builds of ninja with jobserver support than to have none of them support the jobserver. It needs to be all or nothing, so you can detect support by checking the version rather than guessing whether you got it from pip, and everyone can properly opt in to it.
... Or hmm, maybe it should use ninja -j jobserver
.
You can detect jobserver support by parsing the output of --version
:
$ ninja --version
1.10.0.git.kitware.jobserver-1
This is a simplified test case extracted from a real project, where we do want the make -jN because it typically download dependencies and that kind of stuff, and we want that in parallel.
If so, you actually want the missing +
to revert to single-process! Because otherwise with make -j$(nproc)
you might end up with up to $(nproc)
-squared compiler processes.
It's impossible for your recipe to know whether the ninja is a fork with no server support or not, and therefore whether it should participate in the jobserver (by invoking +ninja ...) or not.
Always adding the +
does not hurt, and in fact I'd suggest doing it because +
also overrides make --output-sync
(that is, without +
the whole compilation process might be buffered by make!).
The +
doesn't go right before ninja
. It goes at the beginning of the rule, where you would also put for example a @
:
all: $(VENV) meson.build
+(. $(ACTIVATE) && meson setup builddir && meson compile -v -C builddir)
As long as ninja is the only build execution tool, the current
ninja -jN
implementation works fine.But when you try to convert parts of an existing recursive GNU make based SW build system to ninja, then you have the following situation:
Simply calling `ninja -jY' isn't enough, because then the ninja instances will try to run Y*N jobs, plus the X jobs from the GNU make instances, causing the build host to overload. Relying on -lZ to fix this issue is sub-optimal, because load average is sometimes too slow to reflect the actual situation on the build host.
It would be nice if GNU make jobserver client support could be added to Ninja. Then the N ninja instances would cooperate with the M GNU make instances and on the build host only X jobs would be executed at one time.