Turning the compiler inside out

Philius commented 1 year ago

Before hearing that ORCv2 prevented defining the same function more than once, I had an idea about how to get around the problem, for example in Kaleidoscope. Before creating the function, the process would fork, inheriting all the functions previously defined, sharing their memory. The function definition would be entered as usual, but if the user wanted to redefine it, the program would end the child process(es) that defined it and fork a new process with the new definition.

Thus only the change requires code generation, all the rest is already in memory.

Then it struck me. Compilers work by reading files and creating an in-memory representation, Pre-compiled headers, even if memory mapped, need to page in data from disk.

But what if you stopped the compilation process after reading in all the headers but just before you read the source file, forked it, and for each fork, read in a source file, generated the object code, and exited.

This turns the compiler into an interpreter.

llvmbot commented 1 year ago

@llvm/issue-subscribers-orcjit

lhames commented 1 year ago

Before creating the function, the process would fork, inheriting all the functions previously defined, sharing their memory. The function definition would be entered as usual, but if the user wanted to redefine it, the program would end the child process(es) that defined it and fork a new process with the new definition.

ORC doesn't allow you to define the same ORC symbol more than once, but you can definitely layer redefinition (in the usual sense that the term is used in REPLs) on top of ORC. The usual way is to maintain a generation number for each language level:

kaleidoscope> def foo() 1;     // Implicitly define foo.1
kaleidoscope> def x() foo();   // Implicitly bind to foo.1
kaleidoscope> def foo() 2;     // Implicitly define foo.2
kaleidoscope> def y() foo();   // Implicitly bind to foo.2
kaleidoscope> x();
Result = 1.0
kaleidoscope> y();
Result = 2.0

You can also redirect control flow using an IndirectStubsManager. This lets you change the behavior of previous uses of symbol if that's what you want:

kaleidoscope> def foo() 1;     // Implicitly define foo (stub) and foo.1 (body). Set foo to point at foo.1
kaleidoscope> def x() foo();   // Bind to foo
kaleidoscope> def foo() 2;     // Implicitly define foo.2. Update foo stub to point at foo.2
kaleidoscope> def y() foo();   // Bind to foo
kaleidoscope> x();
Result = 2.0
kaleidoscope> y();
Result = 2.0

The broader idea of using these APIs to turn static compilers into interpreters is a good one. You can often do it without forking if the compiler is available as a library: e.g. The clang project provides the clang-repl tool which uses ORC + Clang to provide a C++ interpreter.

Philius commented 1 year ago

The ability to turn a compiler into an interpreter is useful, but not my main point. The idea is to "land" just after a pre-compiled header is processed into memory.

for each source file do:
  fork off a new instance. This shares the precompiled header memory - no need to re-read it
  read in the source file
  generate object code
  write it to disk
  end forked program

This should speed up compiling a large library or program because the processes share the same precompiled header physical memory. I don't know how llvm handles precompiled headers. I guess if they can map it into memory at the same address in each compilation process then the operating system can share the mapped memory pages between the processes.

Philius commented 1 year ago

By "land" I mean be available for command input, like a service with an RPC interface.

You use the RPC API to tell the service what source file to compile to what object file. The controlling process can monitor memory and CPU usage to avoid thrashing.

It would also be possible to memory map a file containing the generated code from all the compilation steps, each writing to memory in parallel without the need for memory locking as they are dolled out big chunks of virtual memory, enough to store the object code.

Modern file systems use what are called sparse files, that only reserve disk space for non-zero data, so dolling out large chunks doesn't cost anything.

This would require a memory mapped memory allocator, and I've written one, called treedb.

With this, you could avoid or postpone the writing of object files and do the linking as just another RPC command, with the operating system paging disk sectors into memory as needed and as memory allows.

weliveindetail commented 1 year ago

The only reasonable use-case that I ever came up with was the Clang stage-1 build, because it's a single-use throw-away binary: https://www.youtube.com/watch?v=ZCnHxRhQmvs&t=262s However, even for that it's hard to beat the performance of static toolchain with compiler caches.

Apart from that, is there anything we should act on in this ticket?

Philius commented 1 year ago

I'm talking about building an entire distribution with a precompiled header format that can be mapped directly into memory. You could group compile jobs by their dependencies so the operating system fetches in pages on an as-needed basis, instead of thrashing.

Consider the current overhead of de-serialising a pre-compiled header for each compilation, resulting in each compiler process having their own independent in-memory representation. My proposal would be generally more memory, time and energy efficient.

Your video focuses on compiling llvm, jit and bitcode files. I'm on a learning curve and was just looking for a shortcut when I opened this ticket. I guess I'll just wade through it.

Philius commented 1 year ago

Maybe that sounded a bit defeatist, so let me rephrase it. For a C++ compilation, point me to the point in the llvm source code after the pre-compiled header is read in, but before the source file is opened. I can then and there create an RPC server that waits for a source file name and object file to be created. When it gets a job it forks itself, reads the source file and completes the job. The server, having handed the work over the work to the fork, is available to fork some more.

That would make a good first step.

The next step would be to switch the precompiled header allocator to use a disk backed memory mapped allocator like treedb, to persist the precompiled header so that it could be mapped into memory directly.

dwblaikie commented 1 year ago

I'm talking about building an entire distribution with a precompiled header format that can be mapped directly into memory.

Clang's pcm format already allows this, FWIW - well, it's designed to have on-disk hash tables that can be queried without loading much into memory, but it still has to build in-memory AST if the query succeeds and something needs to be loaded. (but it's all very lazy - so it should only load as much AST as is needed by the usage)

Philius commented 1 year ago

I've created a proof of concept project compiler-server that allows a client to specify source and object files to an RPC server. The server in the example forks itself and sleeps for one second.

That's the equivalent to processing another source file, while having the precompiled header loaded in process shared memory.

So instead of main() as in the example, the server would wait for client calls after reading in the precompiled header but before loading the source file and generating the object file.

All memory that isn't modified is shared.

I just have to figure out where to insert this server. The other thing is to somehow ask that the precompiled header be read in completely instead of as needed.

Philius commented 1 year ago

I'm trying to figure out where in llvm/clang the input file specified on the command line is read. I did an instrumented build of clang++ with

cmake -S clang -B build-clang -G Ninja \
-DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG -finstrument-functions -pg" \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCLANG_BUILD_EXAMPLES=1 \
-DLLVM_PARALLEL_LINK_JOBS=1 \
-DLLVM_PARALLEL_COMPILE_JOBS=8 \
-DLLVM_INCLUDE_TESTS=OFF

in the hopes of using uftrace but it didn't work. I also tried

valgrind --tool=callgrind /usr/local/bin/clang++ -DQT_CORE_LIB -DQT_GUI_LIB -DQT_WIDGETS_LIB -I/v3c/Qt/build-clang-pch-Desktop_Qt_6_5_0_clang_64bit-Debug/clang-pch_autogen/include -isystem /v3c/Qt-install/6.5.0/gcc_64/include/QtCore -isystem /v3c/Qt-install/6.5.0/gcc_64/include -isystem /v3c/Qt-install/6.5.0/gcc_64/mkspecs/linux-g++ -isystem /v3c/Qt-install/6.5.0/gcc_64/include/QtWidgets -isystem /v3c/Qt-install/6.5.0/gcc_64/include/QtGui -DQT_QML_DEBUG -g -fPIC -std=gnu++17 -Winvalid-pch -Xclang -include-pch -Xclang /v3c/Qt/build-clang-pch-Desktop_Qt_6_5_0_clang_64bit-Debug/CMakeFiles/clang-pch.dir/cmake_pch.hxx.pch -Xclang -include -Xclang /v3c/Qt/build-clang-pch-Desktop_Qt_6_5_0_clang_64bit-Debug/CMakeFiles/clang-pch.dir/cmake_pch.hxx -MD -MT CMakeFiles/clang-pch.dir/main.cpp.o -MF CMakeFiles/clang-pch.dir/main.cpp.o.d -o CMakeFiles/clang-pch.dir/main.cpp.o -c /v3c/Qt/clang-pch/main.cpp

which worked but I can only step out of the open() call one level at a time, and I'm betting that will take too long.

I tried using kdbg and putting a breakpoint at llvm-project/llvm/lib/Support/MemoryBuffer.cpp line 526

ErrorOr<std::unique_ptr<MemoryBuffer>>
MemoryBuffer::getOpenFile(sys::fs::file_t FD, const Twine &Filename,
                          uint64_t FileSize, bool RequiresNullTerminator,
                          bool IsVolatile, std::optional<Align> Alignment)

but it looks like gdb isn't up to it.

I also tried Qt Creators start and debug external application... but once again, gdb seems to be the problem.

What's the process for debugging clang++?

Of course it would be quicker if someone just told me where to look.

Philius commented 1 year ago

I needed a vanilla debug build of clang to step through it.

Compiler reads in PCH
=====================
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/MemoryBuffer.cpp 527 stack   Frame #0
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp    231 stack   Frame #1
/v3c/Qt/llvm/llvm-project/clang/lib/Basic/FileManager.cpp   555 stack   Frame #2
/v3c/Qt/llvm/llvm-project/clang/lib/Serialization/ModuleManager.cpp 212 stack   Frame #3
/v3c/Qt/llvm/llvm-project/clang/lib/Serialization/ASTReader.cpp 4584    stack   Frame #4
/v3c/Qt/llvm/llvm-project/clang/lib/Serialization/ASTReader.cpp 4308    stack   Frame #5
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp   668 stack   Frame #6
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp   624 stack   Frame #7
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/FrontendAction.cpp 980 stack   Frame #8 <= *** HERE ***
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp   1052    stack   Frame #9
/v3c/Qt/llvm/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp  272 stack   Frame #10
/v3c/Qt/llvm/llvm-project/clang/tools/driver/cc1_main.cpp   249 stack   Frame #11
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 366 stack   Frame #12
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 506 stack   Frame #13
/usr/local/include/llvm/ADT/STLFunctionalExtras.h   45  stack   Frame #14
/usr/local/include/llvm/ADT/STLFunctionalExtras.h   68  stack   Frame #15
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp  440 stack   Frame #16
/usr/local/include/llvm/ADT/STLFunctionalExtras.h   45  stack   Frame #17
/v3c/Qt/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h   68  stack   Frame #18
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp 426 stack   Frame #19
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp  440 stack   Frame #20
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp  199 stack   Frame #21
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp  253 stack   Frame #22
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Driver.cpp   1903    stack   Frame #23
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 542 stack   Frame #24
/v3c/Qt/llvm/llvm-project/build-clang-gcc-debug/tools/driver/clang-driver.cpp   15  stack   Frame #25

Compiler reads in source file
=============================
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/MemoryBuffer.cpp 527 stack   Frame #0
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp    231 stack   Frame #1
/v3c/Qt/llvm/llvm-project/clang/lib/Basic/FileManager.cpp   555 stack   Frame #2
/v3c/Qt/llvm/llvm-project/clang/lib/Basic/SourceManager.cpp 117 stack   Frame #3
/v3c/Qt/llvm/llvm-project/clang/include/clang/Basic/SourceManager.h 1028    stack   Frame #4
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CGDebugInfo.cpp 357 stack   Frame #5
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CGDebugInfo.cpp 561 stack   Frame #6
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CGDebugInfo.cpp 76  stack   Frame #7
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CodeGenModule.cpp   403 stack   Frame #8
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/ModuleBuilder.cpp   166 stack   Frame #9
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp   217 stack   Frame #10
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp   193 stack   Frame #11
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/FrontendAction.cpp 999 stack   Frame #12 <= *** HERE ***
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp   1052    stack   Frame #13
/v3c/Qt/llvm/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp  272 stack   Frame #14
/v3c/Qt/llvm/llvm-project/clang/tools/driver/cc1_main.cpp   249 stack   Frame #15
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 366 stack   Frame #16
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 506 stack   Frame #17
/usr/local/include/llvm/ADT/STLFunctionalExtras.h   45  stack   Frame #18
/usr/local/include/llvm/ADT/STLFunctionalExtras.h   68  stack   Frame #19
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp  440 stack   Frame #20
/usr/local/include/llvm/ADT/STLFunctionalExtras.h   45  stack   Frame #21
/v3c/Qt/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h   68  stack   Frame #22
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp 426 stack   Frame #23
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp  440 stack   Frame #24
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp  199 stack   Frame #25
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp  253 stack   Frame #26
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Driver.cpp   1903    stack   Frame #27
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 542 stack   Frame #28
/v3c/Qt/llvm/llvm-project/build-clang-gcc-debug/tools/driver/clang-driver.cpp   15  stack   Frame #29

Now to figure out how to construct a new source file entry or reuse the existing one.

llvm / llvm-project

Turning the compiler inside out #63233