malxau / yori

Yori is a CMD replacement shell that supports backquotes, job control, and improves tab completion, file matching, aliases, command history, and more.
http://www.malsmith.net/yori/
MIT License
1.24k stars 31 forks source link

Recursive make considered harmful #45

Closed jstarks closed 4 years ago

jstarks commented 4 years ago

http://www.real-linux.org.uk/recursivemake.pdf

Builds are slow (or perhaps I am just impatient) because nmake runs 3 times for every directory, even if nothing has changed.

Would a change to centralize the build directives for all components into one place be reasonable? Perhaps via includes to still allow each directory to keep its build rules locally?

malxau commented 4 years ago

You probably already saw this, but the reason for running three times per directory is to allow each directory to be operated on in parallel. Unfortunately doing so requires a "for" that supports parallel operation (ie., the one from Yori.) Traditional make is single threaded, so putting everything into a single large makefile would serialize it all. So I'd hope/expect that clean builds would be much faster with recursive make and a parallel for than a single makefile, although you're right that if only a single file has changed which links into a single binary, then parallelism is pointless.

Obviously there are many other build environments that support parallel operation natively (cmake, msbuild, etc.) The reason I'm using makefiles is because they're supported on older compilers, and I'm using older compilers to ensure that the resulting binaries "just work" on any OS.

What kind of build times are you seeing? If ydate.exe is in your path, the included makefile will display start and end times.

malxau commented 4 years ago

I've been thinking about this a lot and realized the situation with NMAKE is analogous to CMD. In both cases there's a native code thing that hasn't been maintained, that can't exploit today's hardware, and a managed code replacement that is sufficiently inefficient that it can't exploit the hardware well either. Since it's the same situation, it's time to embrace the same solution...

For the last few months I've been writing YMAKE as an NMAKE replacement. The goal is to offer a compatible experience, so the result is a single build system rather than two, but YMAKE can build a global dependency graph and execute commands in parallel to exploit multi-core devices. The global dependency graph does imply some changes, since recursing into a directory requires a declarative syntax rather than a CMD command. I'm hoping that the benefit of this will only grow over time since core counts look set to increase dramatically soon.

There's lots more to go, but I have enough to build the Yori tree, and resisted the temptation to simplify the Makefiles to make this happen. The results (so far) look like:

NMAKE YMAKE Improvement
Full build, i5 10 8 20.00%
Full build, atom 67 52 22.39%
Change one c file in lib, i5 3.5 2 42.86%
Change one c file in lib, atom 17.5 6.5 62.86%
Change one c file in app, i5 2.5 1.5 40.00%
Change one c file in app, atom 13 4.5 65.38%
Rebuild one app dir only, i5 0.87 0.57 34.48%
Rebuild one app dir only, atom 1.33 0.92 30.83%
Clean, i5 2.5 1.5 40.00%
Clean, atom 12.5 7 44.00%

Next steps:

Like Yori, having source opens the door to being able to revisit a lot of longstanding irritations. I'll probably discover more now my brain is allowed to think about them.

  1. This doesn't implement a lot of what NMAKE does, it implements the (not insignificant) subset that Yori was using. There's a lot more work to be a full drop-in replacement for NMAKE. Some things, like cross directory inference rules, look really useful. There are tons of comments in the code about obscure things that should be done.
  2. NMAKE doesn't allow inference rules to specify dependencies, which always seemed really dumb. It'd be nice to say, "here's how to compile a .c into a .obj, and this should be done if these header files have changed."
  3. A lot of the time above is doing things like install, where files are copied to the target without using dependency information (ie., all binaries get copied.) This is the bulk of the "change one file" test, since compilation is fast but copying all the binaries is not. Making this declarative requires a syntax that can say, "here is a rule that applies to this list of files in that directory."
  4. A lot of the time in clean is spent spawning child processes. Clean is about 10 instances of "if exist .xxx erase .xxx" per directory which is a lot of CreateProcess. Moving this internally should be a huge improvement.
  5. Yori uses the !IF syntax to execute commands to check for compiler capabilities. This is slow (500ms) and happens on every build. Many of these take the form of "cl /? | find /Feature" which spawns three child processes (cmd, cl, find.) If the find could happen in YMAKE, it'd be one child process. If the output of cl /? can be cached and reused for later tests, that's even better.
  6. NMAKE is a mini-shell, which I hadn't noticed until now. Commands like "cd" can't just be kicked to a child CMD process. Since this version is being developed alongside a shell, there seems to be tons of opportunities for integration that need some refactoring. Things like redirection logic or command buffering logic should apply to YMAKE too. Command buffering is a huge usability improvement, since it can pipe output from all commands back into make and output them in a synchronized way (each line is complete and identifies which operation it came from.)
  7. Optimizations in the build system to exploit what YMAKE can do. Given cross directory inference rules, for example, it should be possible to build a debug and release build in parallel. Building x86 and x64 in parallel is theoretically possible too, but requires more intelligence about how to define PATH/INCLUDE/LIB.
jstarks commented 4 years ago

Haha, nerd sniping successful! I'm glad my own nmake replacement project didn't get off the ground, then. Nice work.

One thought: other make-like tools seem to be moving in a different direction--they rely on ninja to perform the actual builds, having first compiled their high-level source files into ninja's lower-level format. Ninja has been optimized to maximally exploit any available parallelism.

CMake and Meson are two popular build systems that have taken this approach. Android did this for a subset of GNU make a while back: https://github.com/google/kati.

What do you think about such an approach?

malxau commented 4 years ago

I think dividing up a high level declarative language from a low level execution engine makes sense, and I agree with the focus that Ninja has on optimizing the repeated workflow of compiling a single directory/piece of code. Both NMAKE and YMAKE handle that case relatively well.

That said, the reason Yori ended up with NMAKE isn't because it's computationally optimal - it obviously isn't - it's because it's included in every Visual Studio for the last 25 years. The challenge of being an open source Windows program is how to lower the barriers to entry so people can get the code and start working with it. It's frustrating that other platforms ship with tools out of the box and have repositories to enable people to get preconfigured, trusted tools, so it's much easier on other platforms to depend on more tools to work with source code. Doing this on Windows is either asking people for a lot of DIY work to assemble a build environment, or to download a giant tool blob, but one common reason for wanting the source is an unwillingness to trust binary blobs.

I think the main advantage of splitting things up for Yori would be to see if it's possible to have a high level declarative system that can generate build files for a range of build systems, so it can work with whatever compiler/toolchain that is convenient without incurring a multiplicative maintenance cost.

I'm a bit curious how Ninja would really compare to Ymake for efficiency though. There's quite a few memory allocations I'd like to get rid of in ymake but it measures where it spends time and the preprocessing cost of parsing all of the makefiles on the entire Yori tree is ~125ms on an i5. Any improvements to build times from here will be driven from eliminating inefficient build processes (re-testing compiler capabilities on each build, clean that spends more time launching child processes than deleting files, install which copies files whether they need to be copied or not.)