sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
712 stars 56 forks source link

meson compile should obey jobs flag #202

Open Quackdoc opened 4 months ago

Quackdoc commented 4 months ago

I tried to compile the schedulers on my system but my machine was quickly rendered unusable. No matter what I do it uses all cores and it looks like it's hard coded to make_jobs = nproc. Im not sure what all is compiling with the make_jobs flag but that was the only thing I saw with a super quick glance through.

ptr1337 commented 4 months ago

I tried to compile the schedulers on my system but my machine was quickly rendered unusable. No matter what I do it uses all cores and it looks like it's hard coded to make_jobs = nproc. Im not sure what all is compiling with the make_jobs flag but that was the only thing I saw with a super quick glance through.

If you are on archlinux, you can consider to fetch it from the cachyos repository for now. Compiling the scx-scheds is pretty demanding, yes.

Quackdoc commented 4 months ago

yeah, I plan on still investigating it for various reasons. I hard coded make_jobs to 8 and that alleviated the problem a bit but not much. Does cachyOS build from the AUR makepkg? it seems like they have been misconfigured as they don't set --buildtype release (arch-meson defaults to --buildtype plain) which causes the rust stuff to get compiled without the --release flag.

sirlucjan commented 4 months ago

yeah, I plan on still investigating it for various reasons. I hard coded make_jobs to 8 and that alleviated the problem a bit but not much. Does cachyOS build from the AUR makepkg? it seems like they have been misconfigured as they don't set --buildtype release (arch-meson defaults to --buildtype plain) which causes the rust stuff to get compiled without the --release flag.

We use Arch's standard meson solutions. If you have your own, share them and then we'll take a look at it.

Quackdoc commented 4 months ago

We use Arch's standard meson solutions. If you have your own, share them and then we'll take a look at it.

all you need to do as far as I can tell right now is add --buildtype release all arch-meson is, is a simple exec script https://gitlab.archlinux.org/archlinux/packaging/packages/meson/-/blob/main/arch-meson?ref_type=heads, the issue here is --buildtype plain \ but with how meson parses arguments, its enough to simply do arch-meson . build --buildtype release note that --buildtype release is more or less the same thing as -Dbuildtype=release but --buildtype conflicts with -Dbuildtype and meson will throw an error, why? no idea.

sirlucjan commented 4 months ago

We use Arch's standard meson solutions. If you have your own, share them and then we'll take a look at it.

all you need to do as far as I can tell right now is add --buildtype release all arch-meson is, is a simple exec script https://gitlab.archlinux.org/archlinux/packaging/packages/meson/-/blob/main/arch-meson?ref_type=heads, the issue here is --buildtype plain \ but with how meson parses arguments, its enough to simply do arch-meson . build --buildtype release note that --buildtype release is more or less the same thing as -Dbuildtype=release but --buildtype conflicts with -Dbuildtype and meson will throw an error, why? no idea.

I've updated PKGBUILDs.

sirlucjan commented 4 months ago

@Quackdoc

Build targets in project: 19

sched_ext schedulers 0.1.7

  User defined options
    auto_features     : enabled
    buildtype         : release
    libexecdir        : lib
    prefix            : /usr
    sbindir           : bin
    wrap_mode         : nodownload
    python.bytecompile: 1
    b_pie             : true

Everything looks correct and the package is smaller by about 20 MB.

htejun commented 3 months ago

So, the problem is the connection between meson and cargo. meson thinks cargo is a regular command which takes up at most one CPU (like e.g. clang or gcc) but cargo takes up all CPUs in the machine. meson ends up running cargo for each rust scheduler, so we end up with a mess. It's something we need to improve but haven't found a good solution in meson.

takase1121 commented 3 months ago

Is it possible to configure cargo to limit concurrency? I tried to compile this repo (with the PKGBUILD) and hit a 80 load average... on a 6-core machine. It also almost caused OOM.

Quackdoc commented 3 months ago

cargo concurrency can be set. for sure. However im wondering if cargo compile is being run at the same time as other compiles in parallel? I havent looked into the building scripts yet. but if it is, running it in sequence would help a lot, especially considering that cargo compile already threads quite well.

takase1121 commented 3 months ago

cargo concurrency can be set. for sure. However im wondering if cargo compile is being run at the same time as other compiles in parallel? I havent looked into the building scripts yet. but if it is, running it in sequence would help a lot, especially considering that cargo compile already threads quite well.

This is just anecdotal evidence, but I've seen duplicate rustc processes in htop. My guess is that each scheduler is compiled in parallel. Another evidence of this is the 80 load average being close to 12*6 (72).

takase1121 commented 3 months ago

I have an idea to fix this issue, but I haven't tried it and it might not be optimal:

It might be possible to store all the custom_targets for the Rust schedulers, create a dummy custom target that depends on each scheduler and the "next" scheduler to compile. This should create a chain of custom targets that will compile in sequence.

takase1121 commented 3 months ago

Here is a patch to do exactly that:

what.diff.txt

diff --git a/scheds/rust/meson.build b/scheds/rust/meson.build
index 52b227a..01eeaa3 100644
--- a/scheds/rust/meson.build
+++ b/scheds/rust/meson.build
@@ -1,5 +1,14 @@
+sched = []
+
 subdir('scx_layered')
 subdir('scx_rusty')
 subdir('scx_rustland')
 subdir('scx_rlfifo')
 subdir('scx_lavd')
+
+custom_target('rust_scheds',
+              input: 'meson.build',
+              output: '@PLAINNAME@.__PHONY__',
+              command: ['touch', '@PLAINNAME@.__PHONY__'],
+              build_by_default: true,
+              depends: sched)
diff --git a/scheds/rust/scx_lavd/meson.build b/scheds/rust/scx_lavd/meson.build
index 3604564..1f6ee8a 100644
--- a/scheds/rust/scx_lavd/meson.build
+++ b/scheds/rust/scx_lavd/meson.build
@@ -1,7 +1,8 @@
-custom_target('scx_lavd',
+sched = custom_target('scx_lavd',
               output: '@PLAINNAME@.__PHONY__',
               input: 'Cargo.toml',
               command: [cargo, 'build', '--manifest-path=@INPUT@', '--target-dir=@OUTDIR@',
                         cargo_build_args],
               env: cargo_env,
-              build_by_default: true)
+              build_always_stale: true,
+              depends: [sched])
diff --git a/scheds/rust/scx_layered/meson.build b/scheds/rust/scx_layered/meson.build
index 039f258..f006a10 100644
--- a/scheds/rust/scx_layered/meson.build
+++ b/scheds/rust/scx_layered/meson.build
@@ -1,8 +1,8 @@
-custom_target('scx_layered',
+sched = custom_target('scx_layered',
               output: '@PLAINNAME@.__PHONY__',
               input: 'Cargo.toml',
               command: [cargo, 'build', '--manifest-path=@INPUT@', '--target-dir=@OUTDIR@',
                         cargo_build_args],
               env: cargo_env,
-              depends: [libbpf, bpftool_target],
-              build_by_default: true)
+              build_always_stale: true,
+              depends: [libbpf, bpftool_target, sched])
diff --git a/scheds/rust/scx_rlfifo/meson.build b/scheds/rust/scx_rlfifo/meson.build
index 24d3b64..fd1ffb9 100644
--- a/scheds/rust/scx_rlfifo/meson.build
+++ b/scheds/rust/scx_rlfifo/meson.build
@@ -1,8 +1,8 @@
-custom_target('scx_rlfifo',
+sched = custom_target('scx_rlfifo',
               output: '@PLAINNAME@.__PHONY__',
               input: 'Cargo.toml',
               command: [cargo, 'build', '--manifest-path=@INPUT@', '--target-dir=@OUTDIR@',
                         cargo_build_args],
               env: cargo_env,
-              depends: [libbpf, bpftool_target],
-              build_by_default: true)
+              build_always_stale: true,
+              depends: [libbpf, bpftool_target, sched])
diff --git a/scheds/rust/scx_rustland/meson.build b/scheds/rust/scx_rustland/meson.build
index 9d7c29c..a243771 100644
--- a/scheds/rust/scx_rustland/meson.build
+++ b/scheds/rust/scx_rustland/meson.build
@@ -1,8 +1,8 @@
-custom_target('scx_rustland',
+sched = custom_target('scx_rustland',
               output: '@PLAINNAME@.__PHONY__',
               input: 'Cargo.toml',
               command: [cargo, 'build', '--manifest-path=@INPUT@', '--target-dir=@OUTDIR@',
                         cargo_build_args],
               env: cargo_env,
-              depends: [libbpf, bpftool_target],
-              build_by_default: true)
+              build_always_stale: true,
+              depends: [libbpf, bpftool_target, sched])
diff --git a/scheds/rust/scx_rusty/meson.build b/scheds/rust/scx_rusty/meson.build
index 0510756..b896ce7 100644
--- a/scheds/rust/scx_rusty/meson.build
+++ b/scheds/rust/scx_rusty/meson.build
@@ -1,8 +1,8 @@
-custom_target('scx_rusty',
+sched = custom_target('scx_rusty',
               output: '@PLAINNAME@.__PHONY__',
               input: 'Cargo.toml',
               command: [cargo, 'build', '--manifest-path=@INPUT@', '--target-dir=@OUTDIR@',
                         cargo_build_args],
               env: cargo_env,
-              depends: [libbpf, bpftool_target],
-              build_by_default: true)
+              build_always_stale: true,
+              depends: [libbpf, bpftool_target, sched])
htejun commented 3 months ago

@takase1121 did you test whether this actually works? If so, do you mind creating a PR?

sirlucjan commented 3 months ago

@takase1121 did you test whether this actually works? If so, do you mind creating a PR?

I'll test it ASAP. Lemme check @htejun

sirlucjan commented 3 months ago

@takase1121 Could you send a PR, please? I tested your patch and it seems that everything works OK. The load during compilation is significantly reduced and the compilation time increased only slightly.

arighi commented 3 months ago

@takase1121 I see a potential issue with this, trying to build a single scheduler also triggers the build of all the others, for example (from a fresh repo):

> meson compile -C build scx_rustland
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/arighi/src/scx/build scheds/rust/scx_rustland/Cargo.toml.__PHONY__
ninja: Entering directory `/home/arighi/src/scx/build'
[2/5] Generating scheds/rust/scx_layered/scx_layered with a custom command (wrapped by meson to set env)
...

So building a single target is now much slower than before. I'm wondering if it'd possible to make the chain of dependency optional passing a special build option or something similar...

takase1121 commented 3 months ago

That's an issue I've just realized now. I don't think there's a lot that can be done about it. Another issue is that the build process technically no longer obeys the meson job count, since cargo just does whatever it wants.

I think that limiting cargo concurrency is the way to go. We can revert my PR and implement the other approach. I don't have time right now to fix this and might take me another week to come out with something and test it. Maybe someone else can do it.

Quackdoc commented 3 months ago

cargo has --jobs support. but even with that cargo is actually being called multiple times in parallel. so that's at least part of the solution

arighi commented 3 months ago

I don't think we should revert this change, preventing almost a fork-bomb when running meson compile is already an improvement.

But I'm not really convinced that limiting cargo concurrency is the way to go, again, that would slow down the build of a single scheduler.

IMHO the best would be to limit meson concurrency instead (without adding extra dependencies) and rely on the "backend" concurrency level of cargo. In this way we won't slow down the build of single targets and we could still prevent the massive fork-bomb build.

Basically we should convince meson to behave like make: spawn a single build task by default, unless you specify a certain level of parallelism (-j N).

I don't really understand why meson is doing the other way around... and apparently there's no way to change this default behavior in the meson files. Apparently the only way is to explicitly add -j 1 to the command line... or maybe there is a way and I haven't found it.

takase1121 commented 3 months ago

Well, there is always https://mesonbuild.com/Release-notes-for-1-3-0.html#automatic-fallback-to-cmake-and-cargo-subproject , it's probably not good, but just saying...

The main problem here is we are working with a meta build system. It's not really up to meson to set concurrency because Ninja is the one running everything. I don't think there's a way to disable this at all. Maybe cmake supports it with job pools (ref), but it'll be a miracle if meson would ever have something similar.

Another workaround I can see is to add the scheduler to a list, and then iterate the list to create the custom_target chain with the [last_scheduler, current_scheduler] dependency, but this will require a big more tinkering because you can't get custom_target name apparently. This is a workable solution but might be a bit uglier than the current solution.

For example: rust_lavd/meson.build:

# revert to original deps, keep the build_always_stale
scheds += custom_target(...)

rust/meson.build:

# you better find some input and output files because you need this for ALL custom_target.
last_dummy = []
for sched in scheds:
  # figure out how to get target_name, or literally store it in another list
  last_dummy = custom_target('wrapper_' + target_name, build_always_stale: true, depends: [last_dummy, sched])
endforeach

custom_target('finally_all_sched', ..., depends: [last_dummy])

Well, good luck with that.