llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.29k stars 11.68k forks source link

[C++20] [Modules] Writing PCM will be unexpectedly slow with additional import. #61447

Open ChuanqiXu9 opened 1 year ago

ChuanqiXu9 commented 1 year ago

We can find the reproducer at: https://discourse.llvm.org/t/modules-increased-build-times/68755/7

For short:

// test.cppm
module;

#include <iostream>

export module test;
import vulkan; // Vulcan is constructed big module.

will take 3~4s to generate the PCM.

And

module;
#include <iostream>
export module test.without.import;

will take about 0.5s to complete only.

Note that:

#include <iostream>
import vulkan;

will take 0.5s to complete too.

Currently I located in ASTWriter::WriteDecl. But I am not sure if this can be fixed simply.

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-modules

rnikander commented 1 year ago

@ChuanqiXu9 I may be hitting this. I've got some files taking around 45 seconds to compile and it's killing me. I wonder if there is a temporary workaround that I could apply to LLVM source locally, without the more well-thought-out fix that you will need to apply. I'm interested in compilers and hacking on Clang, so I could try it. Can you say any more about what you found in ASTWriter::WriteDecl that might be the problem?

koplas commented 1 year ago

It could help to upload a minimal reproducer and a flame graph. The most significant difference when importing a module is that clang::FindExternalVisibleDeclsByName gets called which seems to be slow with big modules.

@ChuanqiXu9 already linked a thread with further info: https://discourse.llvm.org/t/modules-increased-build-times/68755. I only have files that take 5 seconds longer to compile and my reproducer is far from minimal, so a better reproducer could help.

koplas commented 1 year ago

@rnikander if you want to hack on it, you could try to modify: https://github.com/llvm/llvm-project/blob/e58a49300e757ff61142f6abd227bd1437c1cf87/clang/lib/Serialization/ASTWriterDecl.cpp#L2490-L2532

ChuanqiXu9 commented 1 year ago

@rnikander if you want to hack on it, you could try to modify:

This might not be related to the problem.

@ChuanqiXu9 I may be hitting this. I've got some files taking around 45 seconds to compile and it's killing me. I wonder if there is a temporary workaround that I could apply to LLVM source locally, without the more well-thought-out fix that you will need to apply. I'm interested in compilers and hacking on Clang, so I could try it. Can you say any more about what you found in ASTWriter::WriteDecl that might be the problem?

I didn't find a quick solution/workaround for this. The direct cause for the problem is that for this code style:

// a.cppm
module;
#include <iostream>
export module;
import another; // another contains many decls <iostream>

When we merge the declarations from different modules, the declarations in this TU (a.cppm) will be the canonical declarations and the declarations from the imported modules will be the redeclarations. So we have to re-write the declarations in this TU. Also when we write declarations in this TU, we'll try to lookup the decls from the external sources (the imported modules), for example: https://github.com/llvm/llvm-project/blob/edc8b602f996a1fc68c28054c81e8fb671bb874b/clang/lib/Serialization/ASTWriter.cpp#L3960-L3965

This is what I found now. And as you see, it is pretty fundamental so I am not sure if there is a simple workaround.

ChuanqiXu9 commented 1 year ago

As a dirty (and absolutely incorrect) verifier, you can insert the following code to: https://github.com/llvm/llvm-project/blob/edc8b602f996a1fc68c28054c81e8fb671bb874b/clang/lib/Serialization/ASTWriter.cpp#L5578

for (auto *RD : D->redecls()) {
      if (RD == D)
        continue;

      if (RD->isFromASTFile()) {
        DeclIDs[D] = RD->getGlobalID();
        return DeclIDs[D];
      }
}

Note that I don't think this is the correct direction. I think we need to do something in the lookup process if possible.

ChuanqiXu9 commented 1 year ago

@rnikander and for the problem of slow compilation, I am wondering if some of the reason comes from the one phase compilation. For one phase compilation and two phase compilation, you can look this at: https://clang.llvm.org/docs/StandardCPlusPlusModules.html#how-to-produce-a-bmi and https://gitlab.kitware.com/cmake/cmake/-/issues/18355#note_1329192. And cmake choose to implement one phase model initially for the sake of compatibility. But two phase compilation may be faster locally. If the scale of your project isn't too large, you can try to implement the 2 phase compilation model by cmake scripts. Then we will have a better feeling for this.

rnikander commented 1 year ago

Okay, thank you. I'll dig around to try to understand better. I'd like to work on Clang but realistically I probably should focus on my projects. So I guess this process of merging declarations from modules, which you mentioned, has to occur even with the declarations in the another module that are private to it, like the #include <iostream> in it's global module fragment? I can't compile anything at the moment because main branch is segfaulting (another issue, someone already reported). I'll look again tomorrow.

ChuanqiXu9 commented 1 year ago

I'd like to work on Clang but realistically I probably should focus on my projects.

Never mind. This is what open source is : )

So I guess this process of merging declarations from modules, which you mentioned, has to occur even with the declarations in the another module that are private to it, like the #include in it's global module fragment?

The deserialization and visibility checking are two related but different process. The things in global module fragment may be reachable so we have to deserialize them now.

ChuanqiXu9 commented 1 year ago

I sent cf47e9f to mitigate the problem. The patch changes the writing BMI time of test.cppm from 5s to 3s in my environment. But 3s is still longer than I expected. So I'll keep this openning.

ChuanqiXu9 commented 1 year ago

Note for developers: the root cause for the issue is that when we reconcile the declcontext redundantly many times. See the last section of ASTReader::loadDeclUpdateRecords. We will reconcile the decl context completely instead of by need, which may be a major reason for this issue.

ChuanqiXu9 commented 5 months ago

With -fexperimental-modules-reduced-bmi, the compile time for https://github.com/koplas/clang-modules-test goes down to 2.685s from 3.486s my local machine. So it can mitigate the problem a little bit but still not ideal.

I feel the problem may be somewhat pretty hard to fix (or optimize) in clang. Possibly we can only teach users don't do this. e.g. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#performance-tips

(Maybe we don't see so many issues if we're born within a world where modules came in day 1.)

CC @koplas

ChuanqiXu9 commented 5 months ago

Oh, I took another look at the example,

module;

#include <iostream>

export module test;
import vulkan;

export {
    void hello() {
        std::cout << "hello\n";
    }
}

Since we don't want to write the function body for non-inline function, we shouldn't have so many work in ASTWriter for sure. Although we can't solve the underlying issue, we can optimize the specific example.

(It means it will be slow too if the function is inline)

koplas commented 5 months ago

Thank you @ChuanqiXu9 for the progress that has been made.

My problem lies in the usage of third-party libraries, that are used like this:

module;
#include "big_header.hpp"
export module third_party;
export {
    using decl_1;
    using decl_2;
    ...
}

The export-extern-style would certainly fix the performance issue for me, but it would hard to port all third-party libraries to this style. Even when std modules and https://github.com/llvm/llvm-project/issues/80663 reduce the impact, it can be a major performance hurdle for new module users. To be able to import a lot of big third-party libraries without a big performance impact would be the biggest C++ modules selling point for me. I wouldn't mind if the ASTWriter is still slow for the inline function case.

ChuanqiXu9 commented 5 months ago

Yeah, this is more of the topic of how to transfer a header based library to a module based one. And the biggest challenge I saw is how to handle third party libraries. Then it is somewhat a annoying issue. We feel it will be easy if all the third party libraries provide the modules. Note that this definition is recursive. For example, if someday boost provides a modules, ideally, it will be best if it imports std instead of wrapping itself like std module. The std module is special since it is the root of dependencies. Another example is, now it may be fine to write a new module-native library only depends on std module.

But the bad problem is, how can we treat the third party libraries not offering modules? This is annoying since we're doing other people's job. Ideally, it should be the job of other libraries to provide such thing nicely. But now we need to worry about that.

Then it becomes more of an ecosystem problem.

koplas commented 5 months ago

A bit of a question of understanding. Consider this:

module;
#include <iostream>
export module test;

export {
    void hello() {}
}

The resulting pcm file compiled with reduced bmi is 3,4 Megabytes big. If I add import vulkan; it increases to 5,5 Megabytes. Shouldn't both files be relatively small, as they use nothing from the GMF?

Update: Without reduced bmi the respective file size is 4,5 and 7,4 Megabytes. The performance improvement seems to be linear to the file size.

ChuanqiXu9 commented 5 months ago

The resulting pcm file compiled with reduced bmi is 3,4 Megabytes big. If I add import vulkan; it increases to 5,5 Megabytes. Shouldn't both files be relatively small, as they use nothing from the GMF?

This is a problem of quality of implementation. The reduced BMI is still in the very early stage. We're still optimizing it. And this the problem I said why reduced BMI doesn't work. Since we don't write entities in non-inline function bodies, we simply shouldn't write any thing.

I just make a new patch to improve this. After this patch, with reduced BMI, the result of https://github.com/koplas/clang-modules-test goes to:

Building test.cppm

real    0m0.494s
user    0m0.450s
sys 0m0.037s
Building test-without-import.cppm

real    0m0.466s
user    0m0.430s
sys 0m0.033s

The number looks good and we're probably near the extreme. import <module-name> is not a free operation.

But I still want to emphasize that, it doesn't fix the fundamental issue. It will still be slow if we're writing a declaration which has a lot redeclarations in the imported modules. In another word, in the example of https://github.com/koplas/clang-modules-test, if every partition unit of module vulkan uses declarations from <iostream> extensively (not in a non-inline function body, which we just skipped), the writing performance should be pretty slow still. So I don't mind keep the issue opening.

Also from my testing, it may not not a big herotic optimization to real world examples, https://github.com/alibaba/async_simple/tree/CXX20Modules (I tried to make it modules native), I only observed 1%+ size reduction and rough 2%+ compile speed improvements.

koplas commented 5 months ago

I just tested it on my toy project, and the time spent in the frontend is now twice as large as the time spent in the ASTWriter. So I consider it a significant improvement, I guess a switch to std modules would reduce the remaining performance impact. Currently one file crashes with -fexperimental-modules-reduced-bmi:

/usr/local/bin/clang++  -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: /usr/local/bin/clang++ -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
1.  <eof> parser at end of file
2.  Per-file LLVM IR generation
3.  /usr/local/bin/../include/c++/v1/new:268:29: Generating code for declaration 'std::__libcpp_operator_new'
 #0 0x00005599ecbb4496 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/clang+++0x3025496)
 #1 0x00005599ecbb207e llvm::sys::RunSignalHandlers() (/usr/local/bin/clang+++0x302307e)
 #2 0x00005599ecb23450 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007faf8d65a770 (/usr/lib/libc.so.6+0x40770)
 #4 0x00005599ec6f1af4 llvm::Value::getContext() const (/usr/local/bin/clang+++0x2b62af4)
 #5 0x00005599ec68275c llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::AtomicOrdering, unsigned char, llvm::Instruction*) (/usr/local/bin/clang+++0x2af375c)
 #6 0x00005599ec6826cf llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::Instruction*) (/usr/local/bin/clang+++0x2af36cf)
 #7 0x00005599eaacd3be llvm::IRBuilderBase::CreateAlignedStore(llvm::Value*, llvm::Value*, llvm::MaybeAlign, bool) (/usr/local/bin/clang+++0xf3e3be)
 #8 0x00005599ece029d5 clang::CodeGen::CodeGenFunction::EmitReturnStmt(clang::ReturnStmt const&) (/usr/local/bin/clang+++0x32739d5)
 #9 0x00005599ece0b360 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/usr/local/bin/clang+++0x327c360)
#10 0x00005599ece65d06 clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/usr/local/bin/clang+++0x32d6d06)
#11 0x00005599ece889af clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f99af)
#12 0x00005599ece817a1 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f27a1)
#13 0x00005599ece74dc3 clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5dc3)
#14 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#15 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#16 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#17 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#18 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#19 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#20 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#21 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#22 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#23 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#24 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#25 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#26 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#27 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#28 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#29 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#30 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#31 0x00005599ece7259b clang::CodeGen::CodeGenModule::Release() (/usr/local/bin/clang+++0x32e359b)
#32 0x00005599ed25fc18 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0
#33 0x00005599ed25700b clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x36c800b)
#34 0x00005599ed53f7bc clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x39b07bc)
#35 0x00005599eed04549 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/bin/clang+++0x5175549)
#36 0x00005599ed4fbf14 clang::FrontendAction::Execute() (/usr/local/bin/clang+++0x396cf14)
#37 0x00005599ed464854 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/bin/clang+++0x38d5854)
#38 0x00005599ed5c8d50 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/bin/clang+++0x3a39d50)
#39 0x00005599eaa81f21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/bin/clang+++0xef2f21)
#40 0x00005599eaa7f1d0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#41 0x00005599ed2a6b29 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#42 0x00005599ecb231ec llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/local/bin/clang+++0x2f941ec)
#43 0x00005599ed2a64c3 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/local/bin/clang+++0x37174c3)
#44 0x00005599ed2646a2 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/local/bin/clang+++0x36d56a2)
#45 0x00005599ed2648fe clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/local/bin/clang+++0x36d58fe)
#46 0x00005599ed28402f clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/local/bin/clang+++0x36f502f)
#47 0x00005599eaa7e857 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/bin/clang+++0xeef857)
#48 0x00005599eaa8d486 main (/usr/local/bin/clang+++0xefe486)
#49 0x00007faf8d643cd0 (/usr/lib/libc.so.6+0x29cd0)
#50 0x00007faf8d643d8a __libc_start_main (/usr/lib/libc.so.6+0x29d8a)
#51 0x00005599eaa7cc75 _start (/usr/local/bin/clang+++0xeedc75)

The file looks like this:

module;

export module app;

import lime.Game;

export int app_main() {
    lime::Game game = {};
    return 0;
}

If I use lime::Game *game = new lime::Game{}; the crash doesn't occur.

ChuanqiXu9 commented 5 months ago

I just tested it on my toy project, and the time spent in the frontend is now twice as large as the time spent in the ASTWriter. So I consider it a significant improvement, I guess a switch to std modules would reduce the remaining performance impact. Currently one file crashes with -fexperimental-modules-reduced-bmi:

/usr/local/bin/clang++  -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.    Program arguments: /usr/local/bin/clang++ -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
1.    <eof> parser at end of file
2.    Per-file LLVM IR generation
3.    /usr/local/bin/../include/c++/v1/new:268:29: Generating code for declaration 'std::__libcpp_operator_new'
 #0 0x00005599ecbb4496 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/clang+++0x3025496)
 #1 0x00005599ecbb207e llvm::sys::RunSignalHandlers() (/usr/local/bin/clang+++0x302307e)
 #2 0x00005599ecb23450 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007faf8d65a770 (/usr/lib/libc.so.6+0x40770)
 #4 0x00005599ec6f1af4 llvm::Value::getContext() const (/usr/local/bin/clang+++0x2b62af4)
 #5 0x00005599ec68275c llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::AtomicOrdering, unsigned char, llvm::Instruction*) (/usr/local/bin/clang+++0x2af375c)
 #6 0x00005599ec6826cf llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::Instruction*) (/usr/local/bin/clang+++0x2af36cf)
 #7 0x00005599eaacd3be llvm::IRBuilderBase::CreateAlignedStore(llvm::Value*, llvm::Value*, llvm::MaybeAlign, bool) (/usr/local/bin/clang+++0xf3e3be)
 #8 0x00005599ece029d5 clang::CodeGen::CodeGenFunction::EmitReturnStmt(clang::ReturnStmt const&) (/usr/local/bin/clang+++0x32739d5)
 #9 0x00005599ece0b360 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/usr/local/bin/clang+++0x327c360)
#10 0x00005599ece65d06 clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/usr/local/bin/clang+++0x32d6d06)
#11 0x00005599ece889af clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f99af)
#12 0x00005599ece817a1 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f27a1)
#13 0x00005599ece74dc3 clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5dc3)
#14 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#15 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#16 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#17 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#18 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#19 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#20 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#21 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#22 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#23 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#24 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#25 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#26 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#27 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#28 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#29 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#30 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#31 0x00005599ece7259b clang::CodeGen::CodeGenModule::Release() (/usr/local/bin/clang+++0x32e359b)
#32 0x00005599ed25fc18 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0
#33 0x00005599ed25700b clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x36c800b)
#34 0x00005599ed53f7bc clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x39b07bc)
#35 0x00005599eed04549 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/bin/clang+++0x5175549)
#36 0x00005599ed4fbf14 clang::FrontendAction::Execute() (/usr/local/bin/clang+++0x396cf14)
#37 0x00005599ed464854 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/bin/clang+++0x38d5854)
#38 0x00005599ed5c8d50 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/bin/clang+++0x3a39d50)
#39 0x00005599eaa81f21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/bin/clang+++0xef2f21)
#40 0x00005599eaa7f1d0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#41 0x00005599ed2a6b29 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#42 0x00005599ecb231ec llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/local/bin/clang+++0x2f941ec)
#43 0x00005599ed2a64c3 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/local/bin/clang+++0x37174c3)
#44 0x00005599ed2646a2 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/local/bin/clang+++0x36d56a2)
#45 0x00005599ed2648fe clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/local/bin/clang+++0x36d58fe)
#46 0x00005599ed28402f clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/local/bin/clang+++0x36f502f)
#47 0x00005599eaa7e857 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/bin/clang+++0xeef857)
#48 0x00005599eaa8d486 main (/usr/local/bin/clang+++0xefe486)
#49 0x00007faf8d643cd0 (/usr/lib/libc.so.6+0x29cd0)
#50 0x00007faf8d643d8a __libc_start_main (/usr/lib/libc.so.6+0x29d8a)
#51 0x00005599eaa7cc75 _start (/usr/local/bin/clang+++0xeedc75)

The file looks like this:

module;

export module app;

import lime.Game;

export int app_main() {
    lime::Game game = {};
    return 0;
}

If I use lime::Game *game = new lime::Game{}; the crash doesn't occur.

Thanks for testing this. Could you try to give a reproduced example for this crash? Then I can only actually fix it.


BTW, what's the overall end-to-end improvement this change brings to your project?

koplas commented 5 months ago

BTW, what's the overall end-to-end improvement this change brings to your project?

Using -fexperimental-modules-reduced-bmi reduced the compile time of a single file from 2.94s to 2.45s. The WriteAST reduced from 1.2s to 0.87s. The PCM file size decreased from 20 MB to 16.3 MB. The reduction is similar to other module files. A large amount of time is spent on the frontend with std headers, which could be reduced with std modules.

@rnikander Maybe you can provide some benchmarks, as you had some files that had far worse compile times.

Thanks for testing this. Could you try to give a reproduced example for this crash? Then I can only actually fix it.

I could reduce the lime.Game module to this:

module;

export module lime.Game;

import lime.Render;
import entt;

export namespace lime {
    class Game {
    private:
        entt::registry m_registry;
    };
}// namespace lime

However lime.Render and entt are quite large so a minimal reproducer would take some time to create.

koplas commented 5 months ago

I just tested std modules with reduced bmi. Here is the trace without std modules: image And here with std modules: image This looks like the performance I expected from modules. As I no longer see an exponential compile time increase the issue is solved for me. The crash also doesn't occur if I link with std modules.

ChuanqiXu9 commented 5 months ago

However lime.Render and entt are quite large so a minimal reproducer would take some time to create.

Thanks. It will be pretty helpful to improve the quality of the implementation.

kamrann commented 3 months ago

Just wanted to add another data point here after my first attempt at using modules. Briefly, I've modularized a library which is predominantly headers, and template-heavy. My initial approach has been to transform headers into interface partitions pretty much 1:1, with just a small number of distinct top-level modules. At this point I'm not using import std, so I get that it's far from ideal as there is heavy duplication of STL includes in GMFs.

I'm also seeing some very slow compilation times (for the partition TUs themselves, the few implementation units I have, and most of all for downstream consuming TUs), and it is heavily weighted to WriteAST as mentioned above.

The one thing I've noticed and am wondering if is on anyone's radar is memory consumption. From some rough eyeballing of processes, TUs for which clang++ caps out at around 400MB RAM usage in non-modular builds, are pushing up around 3GB in the modules build. One particular TU goes up to 6GB. I believe this is exacerbating the time spent in WriteAST in the case of parallel builds due to memory paging - occasionally I've seen really absurd times (10s of minutes, all falling within WriteAST) when the memory contention has been particularly bad.

Note that -fexperimental-modules-reduced-bmi definitely mitigates this - they're generally capping out around 1.8GB (vs 3GB), and the absurdly long compiles disappear. However this is still a number of times higher memory usage than without modules, and compilation times, while better than with the fat BMI, are still problematic.

ChuanqiXu9 commented 3 months ago

With using modules, it is expected some more memory usages. But if it is a problem depends on the numbers. So my comment in the other issue you opened, the key idea here is to reduce the redeclarations. So it is pretty important to import third party libraries by import std instead of including them everywhere. And -fexperimental-modules-reduced-bmi will be helpful too.