Open rgommers opened 1 year ago
I'm curious why this would be considered a property of the source package, rather than a property of the build machine?
If it's just about the build machine, maybe the build backend itself should make the judgment call to calculate a different default instead of the default Ninja's own code uses, and pass that to ninja unless compile-args is used by the person running pip/build?
It is indeed mostly dependent on the build machine. But I would say that the ninja
default seems to be fine for smaller packages, but doesn't work for heavier builds, so in that sense it's also somewhat package-dependent.
meson-python changing the default to the number of physical cores would clearly be an improvement though, so that is also an option here.
I would like the build server to be able to limit parallelism to avoid OOM. I'm repeatedly having failed webkit builds until I run like -j3 of -j4. Some large C++ projects will fail to compile even with 8 free gigs of RAM
And another thing is to avoid swapping. A build witch hits swapping can run n-time longer than one which keeps everything in RAM.
A slightly more relevant ninja issue is https://github.com/ninja-build/ninja/issues/1482#issuecomment-433861702
Ideally this could be controlled via $NINJAFLAGS
but the ninja developers seem to be resistant to the idea...
The better rephrasing of the issue I have is "you will run out of RAM much faster than your run out of CPU with many C++ programs". When compiling webkit setting -l0 make you run out of RAM within minutes.
And the niche case is distcc/icecream/sccache, where you want to completely disable parallelism management, beyond threat limit
Adding another use case:
In HPC, we compile often on shared resources where we cannot run with default -j
: the system resource watch scripts will abort the ninja builds due to excessive load.
https://github.com/scipy/scipy/issues/18766
I saw in other issues that there might be some resistance to add environment variable control in Ninja, but as a report from the trenches, I find CMake's CMAKE_BUILD_PARALLEL_LEVEL environment control extremely useful for complex build parallelism control. Maybe meson has a similar env variable or could consider adding one?
I'd like to revisit this. Something has to be done, because the default behavior of pip install scipy
having the potential to crash the build or even hanging a machine is not okay. Other heavy packages with C++ code are likely to have similar experiences in the future. The ninja
devs are clearly not going to fix this, so here are the options we have:
If compile-args
does not contain a -j
argument, have meson-python
set -j
to the number of physical cores (taking into account CPU affinity etc. - I have well-tested code for this if we go that way)
-j2
tends to be optimal to large build servers where 2*n_cores + 2
tends to be too high)Adding a setting like use-physical-cores = false
in pyproject.toml
, that when set to true
does the same as (1).
Some kind of hook that allows packages to run custom code and then pass extra compile-args
.
I'd be inclined to go with (2), for safety / backwards compatibility.
This problem arises from the Python packaging ecosystem decision of treating binary or source installs as equivalent and transparently fall back to compiling from source when a binary package is not available. With this premise, it is always possible to find an environment constrained enough where building a given package fails, whatever heuristic is used to determine the parallelization settings.
I'm surprised that the ninja
defaults are not adequate for large packages. I believe ninja
was developed to compile Chrome, which is a huge C++ application, thus should be close to the worst case in terms of resources usage during compilation.
Any heuristic based on the number of processors does not look like a promising solution to me. I haven't looked at the bug reports in detail, but isn't ninja -l
option helping in this case? Is it available at all on Windows?
ninja -l
uses a measure of CPU utilization. I'm not sure if it can help with @ax3l's HPC use case, it may.
It doesn't look like it will do anything for the pip install heavy_package
case in general, because it doesn't look at memory usage and that is the problem here. You want 100% CPU utilization on something like GitHub Actions, so defaulting -l
to 50% may end up using only a single core when you have two vCPUs.
Also, that option doesn't seem to work on Windows, is undocumented, and even seems broken on Linux for me - I can only make it switch between 1 process and 24 on my 12-core machine.
it doesn't look at memory usage and that is the problem here
This is exactly the reason why I don't think that an heuristic based on the number of CPU is beneficial in most cases. There is only so much correlation between the number of CPU and the quantity of memory available.
Another possible solution is to allow variable substitutions in the arguments passed to meson. This way it would be possible to write something like:
[tool.meson-python.args]
install = ['-j', '%(ncores)s']
or something like this. We could even think about allowing simple arithmetic expressions, like %(ncores * 2 / 3)s
, although this would require some more work to implement. The drawback is that this is flexible and extensible, but I don't see much use for this mechanism outside this use case.
There is only so much correlation between the number of CPU and the quantity of memory available.
True - but it is a significant enough correlation that it'd help a lot to implement this. The problem here seems to be that the typical consumer laptop comes with a CPU that has <= 4 physical cores, and 8 GB of RAM. And then 1 GB memory usage per process causes hangs/crashes on a heavy project like SciPy, while something near 0.5 GB seems safe. Shifting a "long tail" of problematic cases by 2x would reduce problems by much more than 2x it looks like.
Another possible solution is to allow variable substitutions in the arguments passed to meson.
Interesting idea. I think I like it. Maybe it's simple enough if we only allow *
, /
, +
, -
and Python scalars (or integers)?
Interesting idea. I think I like it. Maybe it's simple enough if we only allow
*
,/
,+
,-
and Python scalars (or integers)?
This is another demonstration that all configuration formats need to be lisp, and when they are not, they painfully converge to a badly designed and poorly implemented version of lisp 😄
Interesting idea. I think I like it. Maybe it's simple enough if we only allow
*
,/
,+
,-
and Python scalars (or integers)?
I've implemented the parsing and evaluation of the expressions. Using the ast
module it can be done in surprisingly few lines of code. Which format would you like the best for the placeholders?
I think %(ncores)d
and the extended version %(ncores * 2 / 3)d
looks familiar (at least to the Python programmers grow up at the time pf Python 2 and the %
operator).
Using the format string syntax has the advantage of already having a ready parser in string.Formatter
but has the problem that {ncores * 2 / 3 :d}
is not valid as it tries to format a float as an integer.
I actually just realized that supporting the %(ncores)d
format is even easier. I'm opening a PR with a draft implementation.
First the specific feature we need in SciPy: set the number of parallel compile jobs for
ninja
to equal the number of available physical cores (n_phys
). We have a number of reports about theninja
default setting, which is2*n_phys + 2
, either giving out of memory errors or even crashing the build machine outright:For local development this was not difficult to address (see https://github.com/scipy/scipy/pull/18451), because SciPy has a developer interface where we can run custom code before invoking Meson. However, for
pip install
& co, there is no way to do that currently. So a user who may be installing some random package that happens to depend on SciPy may get crash or hang that we can't easily prevent. Hardcoding, say,-j4
is not great - there is no setting that works on low-end machines without throwing away lots of performance on high-end machines. The optimal settings seems to always be close to the number of physical cores.So, ideally there would be a hook that
meson-python
exposes to package authors which can run arbitrary Python code to then setcompile-args
with.I also see a more general pattern here, with gh-159 being another example of the need to execute code first and then set some build-related property (the package version in that case). It's a very similar case, and rather than a specific solution for each of these two problems, we may consider a general mechanism to deal with this kind of thing. Maybe a single Python function to execute, which must return a dict containing a pre-determined set of keys, including
version
,compile-args
and the other*-args
knobs.Thoughts?