premake / premake-core

Premake
https://premake.github.io/
BSD 3-Clause "New" or "Revised" License
3.25k stars 618 forks source link

Proposal for C++20 module support with gmake generator (GCC) #1735

Closed alexpanter closed 3 years ago

alexpanter commented 3 years ago

Modules Proposal

A short while ago I posted this issue asking for a guide on how to use C++-20 modules with premake projects.

From that inquiry I could gather that there had been no attempts yet to implement module support for the gmake generator. There is, however, a work in progress for support with Visual Studio. Microsoft has been an early adopter for modules, and they actually have a presentation at cppcon this year about their module implementation for MSVC and Visual Studio, which seems feature complete though lacking full support for module partitions. So naturally, it has been more easy to implement module support in premake for VS projects.

Since there existed no current effort for doing the same with gmake, I decided to implement it myself. Thus the following proposal.

Description

This document describes a proposal for an implementation which enables C++-20 modules support for the GCC toolset and GNU Makefile. We can assume that most users on Windows platform will be using MSVC, so *Unix will be the primary target of this discussion.

Challenges

I have written about here, and many people have been talking about, the benefits but also the challenges with modules. First of all, with header-inclusion we have a separation between symbol declaration and symbol definition in source code, allowing us to compile translation units in arbitrary order no matter their inter-dependencies. We can include header files freely, and then rely on the linker to provide us the implementation at a later stage.

But sinces a module is a binary representation of an abstract syntax tree, and we do not want to include header files in a module system, we must depend on precompiled binary module interfaces (BMI's). While we certainly can still use header files in a modularized build pipeline it provides us with little benefit. E.g. if our library interface is dependent upon standard library types such as std::vector or std::shared_pointer, then we still need to include those standard library headers in our headers, and in turn include those again in our module units.

So clearly, a build tool for modern C++-20 development must allow the programmer to omit header-inclusion while still providing a functional compilation.

Requirements

As always with C++, a core design goal is to provide the user with maximal freedom of expression. And this philosophy should be kept with a premake implementation. Microsoft has decided to standardize the file extension for module interface units as .ixx. Clang as well, has certain expectations. But, users should be allowed to specify files and project structure as they desire, and the build tool should aim to support their choices.

As for GCC, there is no expected file extension for module units, that is a compilation unit which declares a module. Instead, by enabling the flags -std=c++20 and -fmodules-ts, the compiler reads the module declaration and builds a list of dependencies, and checks for exisiting BMI's for each of those dependencies. If one of these BMI's cannot be found then the whole compilation fails. By default, GCC uses a module mapper which stores local BMI's in a gcm.cache/ directory at the invocation location.

For these reasons, a build tool supporting GCC with modules should read the module declarations of all referenced source files in a project, construct a dependency graph of all their dependencies, and then generate a sequence of build commands such that module units are built in the correct order. GCC does not differentiate between a module partition and a module interface unit - both are just C++ source files and treated with the same recognition. This is a fundamental difference vs. MSVC and Visual Studio which explicitly tags module files.

While we can still tag module files, it is not suffient to simply build module partitions first since they might depend on fully defined module units. Though I have not done any research on how the Visual Studio dev team has developed their module build system I suspect they have done a similar thing.

Parsing a module file

My early-stage module parser performs the actions described above. By reading through the cppreference (here) one may observe that module declarations must always be at the top of source files, which clearly supports the development of such tools. After all, we do not want to read through the entire file if we can retrieve what we need to know from the first ~10 lines. Consider this example:

module;                     // global module fragment

#include <memory>

export module mechanics;    // module declaration

import <iostream>;          // import declaration

export {                    // exported scope
    class Robot {
    public:
        void SayHi() { std::cout << "Hi, I am a robot!\n"; }
    };
}

// [...]

The global module fragment tells us to enable the preprocessor for included headers placed immediately after. When we read the module declaration we know this file should be included in the module build pipeline. The import declaration enables translation units importing this module unit to use the Robot::SayHi() method without needing to #include or import iostream themselves. And finally, when we see about an exported symbol (signified by the export scope) we can safely ignore the rest of the file (which may be hundreds or thousands of lines) since no module imports or exports may appear after this point.

Now, imagine we have another source file (file names are not relevant here):

export module corporation : robotics_facility;

export import mechanics;

export {
    Robot* construct_robot(/* ... */) {
        // ...
    }
}

// [...]

If we have both files in our list of project source files, we always need to build the mechanics-module first, no matter the order in which they appear in the file list. Furthermore, if changes happen to the mechanics-module, we need to invalidate the depending module partition and build that again. Visual Studio will check these dependencies automatically, and with Makefile we can set target dependencies.

Creating build commands

With Makefiles, if we should create build targets for the above-mentioned two module units, they could look like this:

MODULE_FLAGS=-fmodules-ts
STD_HEADER=-xc++-system-header

iostream:
    $(CXX) $(STD) $(MODULE_FLAGS) $(STD_HEADER) $@

mechanics: iostream
    $(CXX) $(STD) $(MODULE_FLAGS) [...]

corporation-robotics_facility: mechanics
    $(CXX) $(STD) $(MODULE_FLAGS) [...]

Seeing that premake already has such a system for detecting source/header dependencies, I do not think building and parsing a module dependency graph will be necessary (whew).

Interface in premake.lua files

We need a new setting called cppmodules which can be se to "enabled", and by default be set to "disabled". When enabled this should add the -fmodules-ts flag to GCC, and the -fmodules flag to Clang (or similar), and the <EnableModules>true</EnableModules> to the generated Visual Studio project files.

We already have the compileas with options for "Module" and "ModulePartition". These should be recognized, but ignored, by gmake2.

We also need further options for toolset when using gmake2 with GCC. At the time of writing, the earliest compiler supporting C++-modules is g++-11, and since people might have earlier versions installed as well, this flag should support "gcc10", "gcc11", "g++-10", "g++-11", etc.

Jarod42 commented 3 years ago

Note: I didn't play with C++20 modules at all (just read some articles).

[Makefile]

My concern, here, is that I'm not sure how you want to differentiate modules (iostream which is not user code, but should be compiled, user-module, 3rd party modules).

Seeing that premake already has such a system for detecting source/header dependencies

If I am correct, Premake just delegates to gcc (It should be -MM flag or similar) to have the dependencies. I don't know if it provides similar stuff for modules (Seems not according to https://stackoverflow.com/questions/66542797/is-there-a-way-to-query-direct-module-dependencies-with-gcc).

We already have the compileas with options for "Module" and "ModulePartition". These should be recognized, but ignored, by gmake2.

I would like to have a common (Premake) interface for C++20 modules, and not filtering by toolset.

How a premake script would look like for a modules project?

alexpanter commented 3 years ago

My concern, here, is that I'm not sure how you want to differentiate modules (iostream which is not user code, but should be compiled, user-module, 3rd party modules).

As far as gcc goes, I have not been able to see the difference. We can compile headers from std library with -x c++-system-header, but it works just as fine with -x c++-user-header. This does not make any difference in the compiling process, no matter if standard header or user header, as long as the compiler's include path is properly set. Based on user code, we can certainly differentiate between import <header>; and import "header";, but angle brackets are sometimes used by people for local includes as well, so we can't assume anything. As far as my research goes, then, we have no way of knowing which iostream is referred to, other than maintaing a list of standard header names (But why would we do that?).

How a premake script would look like for a modules project?

This is kind of improvised since I have no way of testing it, but I try my best:

workspace "ABC"

  architecture "x64"
  configurations
  {
    "Debug",
    "Release"
  }
  outputdir = "%{cfg.buildcfg}-%{cfg.system}-%{cfg.architecture}"

  project "XYZ"
    kind "SharedLib"
    language "C++"
    cppdialect "C++20"
    cppmodules "enabled"   -- might alernatively be "On"/"Off"

    targetdir ("../bin/" .. outputdir .. "/%{prj.name}")
    objdir ("../obj/" .. outputdir .. "/%{prj.name}")

    files
    {
      "src/**.cpp",
      "src/**.cppm", --optional
      "src/**.hpp"
    }
    includedirs
    {
      "./"
    }

    filter "files:**.cppm"
      compileas "Module"

    filter "files:**.cpp"
      compileas "ModulePartition"

    filter "toolset:gcc
        toolset "gcc11"

This would expand to the following filetree (assuming gcc):

ABC/
    src/
        **.cpp
        **.cppm
        **.hpp
    gcm.cache/
        ,/
            mymodule.gcm
        usr/include/c++/11/iostream.gcm
    bin/
        Debug-linux-x86_64/
            XYZ/
                libXYZ.so
    obj/
        Debug-linux-x86_64/
            XYZ/
                mymodule.o
    vendor/
        // third-party libs...

From my immediate understanding, this setup should be enough for both Visual Studio and gmake2 (GCC/Makefile). I would like to note that many tools (e.g. build2) and guides use their own inventions for file endings, in an effort to differentiate between modules, module partitions, and regular translation units. But from what I have seen with gcc, there is no distinction in the compiler interface. And, I think people should be able to use e.g. .cpp only, if they so desire.

Note on compilation

I cannot speak for other build tools, but at least from GCC, unfortunately, all build commands must be run from the same directory (I suggest root dir for the workspace). That way the module cache will be located in the same place as obj/ and bin/.

alexpanter commented 3 years ago

If I am correct, Premake just delegates to gcc (It should be -MM flag or similar) to have the dependencies. I don't know if it provides similar stuff for modules (Seems not according to https://stackoverflow.com/questions/66542797/is-there-a-way-to-query-direct-module-dependencies-with-gcc).

@Jarod42 You are correct. This is what premake does.

But, I have just performed a bit more research. This is an example project:

// partition.cpp
export module partition;

import :partition1;
export import :partition2;
export import :partition3;

export void Hello1() { _Hello1(); }

After compiling each partition individually, I would build the primary module interface (above) like this:

g++-11 -std=c++20 -fmodules-ts -c -MMD partition.cpp

This generates a file partition.d with the following contents:

partition.o gcm.cache/partition.gcm: partition.cpp
partition.o gcm.cache/partition.gcm: partition:partition3.c++m \
 partition:partition2.c++m partition:partition1.c++m
partition.c++m: gcm.cache/partition.gcm
.PHONY: partition.c++m
gcm.cache/partition.gcm:| partition.o
CXX_IMPORTS += partition:partition3.c++m partition:partition2.c++m \
 partition:partition1.c++m

So, without any insight into how premake is structured, I would assume that we can use gcc's built-in functionality for detecting dependencies like this. Though, more research is needed. E.g., I could not build the module interface before building the partitions..

Jarod42 commented 3 years ago

As far as gcc goes, I have not been able to see the difference. We can compile headers from std library with -x c++-system-header, but it works just as fine with -x c++-user-header.

Generally, difference between system include and regular include are that warnings are off on system include.

My second concern was about existing modules (might be (related to) what you call "module cache"). As includedirs, should we have moduledirs and something similars to links?

So, without any insight into how premake is structured, I would assume that we can use gcc's built-in functionality for detecting dependencies like this. Though, more research is needed. E.g., I could not build the module interface before building the partitions..

IMO, it is not to Premake to parse the files, so indeed, investigating in that direction of gcc's built-in functionality for build-order and dependencies is the way to go.

BTW, with:

filter "files:**.cppm"
    compileas "Module"

filter "files:**.cpp"
    compileas "ModulePartition"

Premake can already make dependencies between partitions and modules (not fine-grained though).

alexpanter commented 3 years ago

Premake can already make dependencies between partitions and modules (not fine-grained though).

But that will (likely?) not be sufficient. Module partitions (of the same module) may import each other arbitrarily as long as no cycles are present. And partitions may import other (non-partition) modules. Besides, module implementation units need to be built after the primary module interface unit. So perhaps we also need compileas "ModuleImplementation" ?

I have a feeling that the compileas option was added only to support Visual Studio and is completely irrelevant for other toolsets. I may be wrong, though (can't speak for Clang yet). I have written a lengthy reflection on compilation order here.

Generally, difference between system include and regular include are that warnings are off on system include.

That sounds like a good approach! But if premake/gcc can already tell that #include <iostream> is a system-header, then perhaps it can do the same for import <iostream>;. Thought that last should be treated as a project-local module, only to be built once, and in a particular directory so that subsequent build commands may find it. Again, only speaking for gcc now. May try with clang as well later.

IMO, it is not to Premake to parse the files, so indeed, investigating in that direction of gcc's built-in functionality for build-order and dependencies is the way to go.

I agree with you completely. I merely made that tool out of curiosity, and because I didn't know that gcc could generate dependency files and track them.

My second concern was about existing modules (might be (related to) what you call "module cache"). As includedirs, should we have moduledirs and something similars to links?

I am currently trying to hack a premake-generated Makefile to see if I can integrate module support into it somehow. Likely I will know more in a couple of days. But, immediately from intuition, include path and module cache path are completely orthogonal concepts. I have a problem with gcc creating a local module cache directory inside each project folder, but it should instead have one for the entire workspace placed in root directory - that way module cache can be shared between projects referencing each other, and/or so we don't need to build iostream module for each project.

alexpanter commented 3 years ago

@Jarod42 Would it be possible to rely on symbolic links for the module cache? Because if so, adding module support will be really simple and straightforward. I can confirm, after testing, that the current setup with gcc dependency files works well enough with tracking module dependencies (bugs taken into account).

The generated Makefiles already have a setting like "SHELLTYPE":

SHELLTYPE := posix
ifeq (.exe,$(findstring .exe,$(ComSpec)))
    SHELLTYPE := msdos
endif

# [...]

ifeq (posix,$(SHELLTYPE))
    $(SILENT) rm -f  $(TARGET)
    $(SILENT) rm -rf $(GENERATED)
    $(SILENT) rm -rf $(OBJDIR)
else
    $(SILENT) if exist $(subst /,\\,$(TARGET)) del $(subst /,\\,$(TARGET))
    $(SILENT) if exist $(subst /,\\,$(GENERATED)) rmdir /s /q $(subst /,\\,$(GENERATED))
    $(SILENT) if exist $(subst /,\\,$(OBJDIR)) rmdir /s /q $(subst /,\\,$(OBJDIR))
endif

Now, I'm just guessing, but are symbolic links not generally supported on posix shells?

Options

So far, I have detected 3 possible options that we could explore:

1) Adding a symbol link to a gcm.cache/ directory inside each project's directory. This is by far the simplest fix (I think - more testing will be done!). This repository shows how symbolic links can be used to circumvent the restrictions on the gcc module mapper (item 3).

2) Modify the paths inside the generated Makefile, ie. prepend cd .. && to each CXX build command. I have found this to work fine enough with building a shared library, though linking to it from another project causes issues (might be solvable).

3) Rely on the GCC module mapper. This will require dependency on an external file containing dirpaths for referenced modules, and the compiler flag -fmodule-mapper=<file>. This is extremely buggy. I have tried a lot of different combinations but can't seem to get it to work, and documentation is too limited/early-stage.

Jarod42 commented 3 years ago

For other reading:

alexpanter commented 3 years ago

After having done some research, I have the following inconvenient conclusion:

The way that premake is currently using -MMD to create dependency files with GCC is not working properly with modules (likely due to bugs). That is, I can hack a premake-generated Makefile to compile and link, but it still cannot determine the correct build order.

This does not change, even when a custom module mapping file is provided to the compiler, as I suggested with option 3. in my previous post.

I don't know why. Some help would be appreciated. There is quite a lot of combinations of parameters to consider.

Current possible option

If we use some tool (like I suggested in beginning of post) to sort the order in which sources are added to the Makefile variables $(GENERATED) and $(OBJECTS) we can circumvent this limitation.

Besides that, it seems our only option is to wait for the GCC devs to let their tools mature.

hsandt commented 2 years ago

Hm, too bad this is closed, although I see why the initial proposal doesn't work at the moment. This will need reopening in another form at some point though, not only for gcc/g++ but also clang, which doesn't support one-line compilation and needs intermediate module compilation steps, also in the correct order.

Clang note: I've been working with clang + modules for a moment, and while I think I can build a single module reliably (see https://mariusbancila.ro/blog/2020/05/15/modules-in-clang-11/), I cannot consistently build a chain of modules yet (module depending on another one). It's okay for "pure" modules, but as soon as I'm including external headers like std, it may fail at the intermediate module compilation step due to either "no definition" or "redefinition" of external symbol. Maybe it's just clang-specific, maybe it's something we'll also stumble upon while working on more complex projects with gcc...

Back to GCC: in the meantime, it would be nice to have low-level options so knowledgeable users can at least setup a project with modules. For instance, if I can deduce the correct dependency order on my own, I could pass the list of module files in order as a dedicated option, and main and non-module files as sources separately, in any order, and premake would put them together.

That said, it's not so different from using files {} and passing all modules in dependency order, then remaining source files, so I guess doing this + buildoptions "-fmodules-ts" will be enough for now... As long as the one-line compilation works indeed.

nickclark2016 commented 2 years ago

I think the current plan is to wait for full module implementation, that way we know exactly what we're getting into. My personal reservation with implementing a feature still not done in a compiler is that things like flags and such may change.

alexpanter commented 2 years ago

Compiler-support is sorely lacking. GCC still doesn't support external visibility from module partitions. And I doubt they have made any major strides towards enhancing their module mapping interface. Actually, it seems they just cancelled work on gcc11 in favor of gcc12, and they are actively working on implementing c++23 features instead.

I don't get it - modules are so much more interesting and worthwhile..

@hsandt I don't really know much about what/how clang does, but it seems (at least to me) that premake should work out of the box. That is, if compiler interfaces vary a lot, then premake should provide an agnostic Facade to them. I halted development on my module parser because I hate reinventing the wheel all the time (C does not have std::map), and because it was unclear if premake would really need such a tool. It could be that for clang it would make sense, while for GCC and MSVC it would not!

There are other tools such as CMake or Xmake, which actually provide module support. But they are still limited by the compiler. And to my knowledge: No major compiler vendor can provide complete module support right now.

So I kind of gave up and decided to hibernate a year or two.. 😕

hsandt commented 2 years ago

Yeah I heard that for CMake too. If I manage to make gcc and clang work at 100% with even a deep module dependency tree, I'll try to write a post with an example we can use as reference. Maybe gcc and clang will add/fix module dependency detection in the meantime to make our work easier!

hsandt commented 1 year ago

I found new instructions for 2023 on https://www.kitware.com/import-cmake-c20-modules/ (thanks to this post: https://github.com/microsoft/vscode-cpptools/issues/6302#issuecomment-1529093024) but I haven't tried them yet.

I wouldn't hold my breath, but it's possible that the suggested fork/experimental options make things easier.

alexpanter commented 1 year ago

@hsandt interesting post, thanks for sharing! I will try that with CMake at some point ^{tm}. :)

I heard from a {fmt} library developer, that they have ported {fmt} to modules. Works with MSVC (for 2 years), and clang, but that GCC fails "spectacularly".

Perhaps GCC devs are waiting for the committee to address the issues explained in Remember the FORTRAN, and/or for the promised C++23 modularization of the standard library. I don't know.

But it's probably best to wait until all premake's compiler backends have full support, especially in regards to the module mapper.

xparq commented 1 year ago

Remember the FORTRAN

@alexpanter, not denying the difficulties, that paper seems to have made a big deal out of generated sources with modules disrupting the dependency graph -- as if the same wasn't true for generated files with headers, both cases requiring a two-step process with a full deps. scan first, if I'm not mistaken.

Any good news, anyone, BTW?

rtosman commented 9 months ago

I was just googling to see if something like this existed yet for premake. Guess not :-)