Automoc implementation discussion

The goal of this issue is to start/continue the discussion on how/if a meson equivalent to CMake's automoc can be implemented. Most of the content of this issue is based on the previous IRC discussion.

There are fundamentally two approaches, how this feature can be implemented:

Dynamic approach

This is what CMake does (which obviously doesn't automatically make it the best approach). Here the source files that require moc compilation are detected during each build step (ninja). The detection works, by scanning all relevant source files for Q_OBJECT macros, etc.

Advantages

No input from the developer required
- Just works in most cases
- Adding/Removing moc targets cannot be forgotten by the developer
- Already exists in CMake

Disadvantages

Requires (partially) parsing C++ code
- Time-consuming
- Can be error-prone
Always invoked
- The detection algorithm has to be executed for every ninja call
- Required for accounting for untracked source files
- Will slow down the build by a constant factor
May break on different operating systems or meson versions due to bugs

Possible solutions

The speed penalty for always invoking the moc detection algorithm can be reduced with smart caching (however, this would add another point of failure).

Rewriter approach

With this approach, the meson.build file itself is automatically modified by the meson rewriter. More specifically, already existing calls to preprocess method of the qt5 module are modified, similar to adding/removing files from a build target. This step has to be manually executed by the developer.

Advantages

Direct modification of the meson.build
- No hidden "magic"
- Using automoc is completely optional
- Works with older meson versions
Completely independent from the setup/build phase
- Does not slow down the build
- Cannot break with different meson versions or operating systems

Disadvantages

Has to be manually invoked
- May lead to confusing build errors

Possible solutions

Confusion about a missed meson automoc can be avoided by providing an optional (opt-in) automoc hook for the build process. This hook would simply check if the current preprocess calls are up-to-date and print a warning or abort the build process. Naturally, this would slow down the build step, but it would be a completely optional feature.

Notes

I will update this issue, should new advantages/disadvantages be discovered. I am also willing to implement this feature myself, but I want to discuss this here first to increase the odds of getting it merged.

Whatever the path chosen is, it must not add a scanning step in the no-build case. That is, if you execute ninja successfully and then run it again, it must not start any processes. CMake's automoc always does that and it is incredibly annyoing.

AutoMoc implementation proposal 1

This proposal is based on the dynamic approach and the IRC discussion with @textshell.

Goals

no overhead for a configured build (running ninja with no source code changes)
- exception: checking timestamps
minimal configuration overhead
- automoc is opt-in (don't pay for what you don't use)
- reduce reconfigure overhead when possible
should work with all generators
- this can probably be mostly solved by using the already existing generator infrastructure
keep the amount of "magic" to a minimum
- clearly define what files are scanned (sources, headers)
- specify the macros that meson scans for
- guarantee that meson automoc will work within those limitations
detect moc targets during setup and not build like CMake
- requires regenerating the build files (ninja.build, vs, xcode)
- do the XCode and VS generators support regenerating the build system? I couldn't find anything in the code and there doesn't seem to be a --reconfigure rule in the generated project file.

Overview

        //=========\\
  /-----||  START  ||
  |     \\=========//
  |           |
  |           |
  |           V
  |      /---------\
  |      |  Setup  |
  |      \---------/
  |           |
  |           |
  |           V
  |   /----------------\
  |   |   Autodetect   |<---\
  |   \----------------/    |
  |           |             |
  |           |             |
  |           V             |
  |    /--------------\     |
  |    |   Generate   |     |    only
  |    \--------------/     |    when
  |           |             |   changes
  |           |             |     in
  |           V             |     the
  |     /-----------\       |   automoc
  \---->|   Ninja   |       |    status
        \-----------/       |     are
              |             |  detected
              |             |
              V             |
      /----------------\    |
      |   Autodetect   |----/
      \----------------/

In contrast to the CMake implementation, the main automoc detection would be executed during the meson setup step and not during the build (ninja) step. This has the advantage that all the moc rules can be directly built into ninja.build, but the generator has to be rerun each time a new file has to be moced.

During the build (ninja) step, only a minimum autodetect step is executed to detect changes in the moc status (added/removed Q_OBJECT, etc.). Only if changes are detected is the build file regenerated by meson.

To skip parsing the meson.build each time, the interpreter results should be serialized. I am not sure how easy this is, but we are already doing this for coredata, etc.

Automoc detection

To make the detection algorithm more deterministic, each input file (*.cpp) is processed separately. For each file only a view specific files are considered for the automoc algorithm (CMake scans all header files in the directory).

The specific rules are stil TODO.

This way, the automoc detection can also be used during the build phase to detect automoc relevant changes. This requires also requires that this automoc algorithm can also be executed as a standalone from ninja. In addition, the automoc results are written to automoc.json files.

Rule generation

The final build.ninja rule generation could look something like this:

# Only write to output on when the written content
# differs from the new file content
rule qt_AUTOMOC
 command = /path/to/meson/automoc.py -i $in -o $out -d $DEPFILE
 deps    = gcc
 depfile = $DEPFILE
 restat  = 1

# Use the JSON files to regenerate the build.ninja
# Skip interpreter part and load the serialized data
rule qt_REGENERATE
 command = /path/to/meson.py --internal --qt-regenerate
 generator = 1

build fileA.cpp.automoc.json: qt_AUTOMOC fileA.cpp
 DEPFILE = fileA.cpp.automoc.json.d

build fileB.cpp.automoc.json: qt_AUTOMOC fileB.cpp
 DEPFILE = fileB.cpp.automoc.json.d

build build.ninja: qt_REGENERATE fileA.cpp.automoc.json fileB.cpp.automoc.json

I'm not a fan of any solution that requires scanning the contents of files. If the C++ modules story goes the way it currently seems to be going (I hope it doesn't, but I can't really do anything about it) then we already need to scan the sources once. This adds a second round. Other tools may add yet a third one. And so on.

Spawning processes on Windows is abysmally slow. Simply invoking cl /? 10 000 times on an 8 core machine takes three minutes. Two scanning steps means that the machine would be stuck for 6 minutes doing nothing but preprocessing steps before any compilation can begin.

Conceptually an even bigger problem is how do you order the operations? Can you do them in parallel? Moc first? Module scanning first? Most likely Moc goes first, but for any similar tool that would require module information the order might be reversed.

For any automoc implementation scanning the source code is required (although the impact for the rewriter approach would be nonexistent for in the build step).

For windows, it would be possible to bundle all scanning steps for a single target (depending on the implementation, even the scanning for C++20 modules could be added here). This way, the spawning processes issue can be avoided on windows.

Generating the *.automoc.json files can be done in parallel (see the minimal build.ninja example). For incremental builds, this also has the advantage that only the changed source files are processed (no rescanning on every ninja invokation, like in CMake). The build.ninja is then only regenerated if at least one of the *.automoc.json files has changed. So a change in fileA.cpp will only trigger the qt_REGENERATE rule if this results in a change in the corresponding fileA.cpp.automoc.json.

Also, I just checked, the scanning moc scanning will be first because the build build.ninja: is always executed first. For generating the actual moc rules, the same approach as in the current Qt mdule will be used.

So basically, automoc adds a step before the the moc rules of the current qt module.

AutoMoc implementation proposal 2

While researching the C++20 modules, I noticed that this whole process could be vastly simplified with ninja 1.10 and the dyndep PR. With this new feature, rerunning meson wouldn't be required to inject the qt moc rules detected at build time.

This implementation proposal is based on https://github.com/mesonbuild/meson/issues/5730#issuecomment-519669030

Goals

#include "proposal1"

Overview

The fundamental difference to proposal 1 is that instead of regenerating the ninja.build with meson, the new dyndep rules are used to inject the correct moc dependencies at build time dynamically.

  /-----------\                              /-------------\
  | fileA.cpp |----\                   /---->|  target.d   |
  \-----------/    |                   |     \-------------/
                   |                   |
  /-----------\    |     //======\\    |     /-------------\
  | fileB.cpp |----X---->|| SCAN ||----X---->|  target.dd  |
  \-----------/    |     \\======//    |     \-------------/
                   |                   |
  /-----------\    |                   |     /-------------\
  | fileC.cpp |----/                   \---->| target.json |
  \-----------/                              \-------------/

  /-------------------------\     //=============\\     /---------------------\
  | fileA.cpp + target.json |---->|| MOC WRAPPER ||---->| filaA.moc + DEPFILE |
  \-------------------------/     \\=============//     \---------------------/

  /-------------------------\     //=============\\     /---------------------\
  | fileB.cpp + target.json |---->|| MOC WRAPPER ||---->| filaB.moc + DEPFILE |
  \-------------------------/     \\=============//     \---------------------/

  /-------------------------\     //=============\\     /---------------------\
  | fileC.cpp + target.json |---->|| MOC WRAPPER ||---->| filaC.moc + DEPFILE |
  \-------------------------/     \\=============//     \---------------------/

To reduce the amount of spawned processes (which is important for Windows apparently), only one scanning process is used per target. The MOC WRAPPER the consumes the scan results for the input file and runs moc according to the results. This wrapper step is necessary because there are multiple ways moc files can be used.

The SCAN step

The SCAN step produces three files: the ninja dyndep file target.dd, a normal depfile target.d, and the scan results file target.json. As mentioned above, the scan results in the target.json are also important for the MOC WRAPPER, where the exact moc rule(s) have to be determined.

Although there is only one scanning step, each file is scanned independently with the same limitations as in proposal 1. After each file is scanned (or "compiled"), the results are "linked" together to produce the final result files. This would allow trivial parallelization of the scanning step (if necessary).

This scanning step can also be extended to support other dyndep file generation (C++20 modules) at the same time.

<C++20 modules thoughts> I know that compiling/processing build files twice is not ideal, but this would require the least effort on the compiler side and should thus work with every compiler. Additionally, parsing the limited C++ syntax to support modules should be feasible inside the meson, since the modules syntax is fairly restrictive (as far as I know). </C++20 modules thoughts>

Example target.json format:

{
  "automoc": {
    "fileA.cpp": {
      "moc_include": ["fileA.cpp"],
      "moc_compile": ["fileA.hpp"]
    },
    "fileB.cpp": {},
    "fileC.cpp": {},
  },
  "modules": {
    "fileA.cpp": {
      "provides": ["A"],
      "imports": ["vector", "list", "math"]
    },
    "fileB.cpp": {},
    "fileC.cpp": {},
  }
}

Unnecessary rescans of source files can be prevented by comparing the timestamps of target.dd and the source files.

Rule generation

# ninja.build
rule SCAN
 command = /path/to/meson/scan.py -d $DEPFILE -D $DYNDEP -s $RESULT -i $in
 deps    = gcc
 depfile = $DEPFILE
 restat  = 1

rule MOC
 command = /path/to/moc/wrapper -d $DEPFILE $in
 deps    = gcc
 depfile = $DEPFILE
 restat  = 1

build testLib.json testLib.dd: SCAN fileA.cpp fileB.cpp fileC.cpp
 DEPFILE = testLib.d
 DYNDEP = testLib.dd
 RESULT  = testLib.json

# Maybe phony or PHONY can be used instead of an dummy file
build fileA.moc.cpp: MOC fileA.cpp testLib.json || testLib.dd
 dyndep = testLib.dd

build fileB.moc.cpp: MOC fileB.cpp testLib.json || testLib.dd
 dyndep = testLib.dd

build fileC.moc.cpp: MOC fileC.cpp testLib.json || testLib.dd
 dyndep = testLib.dd

build fileA.moc.cpp.o: cpp_COMPILER fileA.moc.cpp
build fileB.moc.cpp.o: cpp_COMPILER fileB.moc.cpp
build fileC.moc.cpp.o: cpp_COMPILER fileC.moc.cpp

build fileA.cpp.o: cpp_COMPILER fileA.cpp || testLib.dd
 dyndep = testLib.dd

build fileB.cpp.o: cpp_COMPILER fileB.cpp || testLib.dd
 dyndep = testLib.dd

build fileC.cpp.o: cpp_COMPILER fileC.cpp || testLib.dd
 dyndep = testLib.dd

# testLib.dd
ninja_dyndep_version = 1

# fileA.cpp includes fileA.cpp.moc
build fileA.moc.dummy | fileA.cpp.moc: dyndep
build fileA.cpp.o: dyndep | fileA.cpp.moc

mesonbuild / meson

Meson automoc #5730

Automoc implementation discussion

Dynamic approach

Advantages

Disadvantages

Possible solutions

Rewriter approach

Advantages

Disadvantages

Possible solutions

Notes

AutoMoc implementation proposal 1

Goals

Overview

Automoc detection

Rule generation

AutoMoc implementation proposal 2

Goals

Overview

The SCAN step

Rule generation