Open Philius opened 1 year ago
@llvm/issue-subscribers-orcjit
Before creating the function, the process would fork, inheriting all the functions previously defined, sharing their memory. The function definition would be entered as usual, but if the user wanted to redefine it, the program would end the child process(es) that defined it and fork a new process with the new definition.
ORC doesn't allow you to define the same ORC symbol more than once, but you can definitely layer redefinition (in the usual sense that the term is used in REPLs) on top of ORC. The usual way is to maintain a generation number for each language level:
kaleidoscope> def foo() 1; // Implicitly define foo.1
kaleidoscope> def x() foo(); // Implicitly bind to foo.1
kaleidoscope> def foo() 2; // Implicitly define foo.2
kaleidoscope> def y() foo(); // Implicitly bind to foo.2
kaleidoscope> x();
Result = 1.0
kaleidoscope> y();
Result = 2.0
You can also redirect control flow using an IndirectStubsManager
. This lets you change the behavior of previous uses of symbol if that's what you want:
kaleidoscope> def foo() 1; // Implicitly define foo (stub) and foo.1 (body). Set foo to point at foo.1
kaleidoscope> def x() foo(); // Bind to foo
kaleidoscope> def foo() 2; // Implicitly define foo.2. Update foo stub to point at foo.2
kaleidoscope> def y() foo(); // Bind to foo
kaleidoscope> x();
Result = 2.0
kaleidoscope> y();
Result = 2.0
The broader idea of using these APIs to turn static compilers into interpreters is a good one. You can often do it without forking if the compiler is available as a library: e.g. The clang project provides the clang-repl
tool which uses ORC + Clang to provide a C++ interpreter.
The ability to turn a compiler into an interpreter is useful, but not my main point. The idea is to "land" just after a pre-compiled header is processed into memory.
for each source file do:
fork off a new instance. This shares the precompiled header memory - no need to re-read it
read in the source file
generate object code
write it to disk
end forked program
This should speed up compiling a large library or program because the processes share the same precompiled header physical memory. I don't know how llvm handles precompiled headers. I guess if they can map it into memory at the same address in each compilation process then the operating system can share the mapped memory pages between the processes.
By "land" I mean be available for command input, like a service with an RPC interface.
You use the RPC API to tell the service what source file to compile to what object file. The controlling process can monitor memory and CPU usage to avoid thrashing.
It would also be possible to memory map a file containing the generated code from all the compilation steps, each writing to memory in parallel without the need for memory locking as they are dolled out big chunks of virtual memory, enough to store the object code.
Modern file systems use what are called sparse files, that only reserve disk space for non-zero data, so dolling out large chunks doesn't cost anything.
This would require a memory mapped memory allocator, and I've written one, called treedb.
With this, you could avoid or postpone the writing of object files and do the linking as just another RPC command, with the operating system paging disk sectors into memory as needed and as memory allows.
The only reasonable use-case that I ever came up with was the Clang stage-1 build, because it's a single-use throw-away binary: https://www.youtube.com/watch?v=ZCnHxRhQmvs&t=262s However, even for that it's hard to beat the performance of static toolchain with compiler caches.
Apart from that, is there anything we should act on in this ticket?
I'm talking about building an entire distribution with a precompiled header format that can be mapped directly into memory. You could group compile jobs by their dependencies so the operating system fetches in pages on an as-needed basis, instead of thrashing.
Consider the current overhead of de-serialising a pre-compiled header for each compilation, resulting in each compiler process having their own independent in-memory representation. My proposal would be generally more memory, time and energy efficient.
Your video focuses on compiling llvm, jit and bitcode files. I'm on a learning curve and was just looking for a shortcut when I opened this ticket. I guess I'll just wade through it.
Maybe that sounded a bit defeatist, so let me rephrase it. For a C++ compilation, point me to the point in the llvm source code after the pre-compiled header is read in, but before the source file is opened. I can then and there create an RPC server that waits for a source file name and object file to be created. When it gets a job it forks itself, reads the source file and completes the job. The server, having handed the work over the work to the fork, is available to fork some more.
That would make a good first step.
The next step would be to switch the precompiled header allocator to use a disk backed memory mapped allocator like treedb, to persist the precompiled header so that it could be mapped into memory directly.
I'm talking about building an entire distribution with a precompiled header format that can be mapped directly into memory.
Clang's pcm format already allows this, FWIW - well, it's designed to have on-disk hash tables that can be queried without loading much into memory, but it still has to build in-memory AST if the query succeeds and something needs to be loaded. (but it's all very lazy - so it should only load as much AST as is needed by the usage)
I've created a proof of concept project compiler-server that allows a client to specify source and object files to an RPC server. The server in the example forks itself and sleeps for one second.
That's the equivalent to processing another source file, while having the precompiled header loaded in process shared memory.
So instead of main() as in the example, the server would wait for client calls after reading in the precompiled header but before loading the source file and generating the object file.
All memory that isn't modified is shared.
I just have to figure out where to insert this server. The other thing is to somehow ask that the precompiled header be read in completely instead of as needed.
I'm trying to figure out where in llvm/clang the input file specified on the command line is read. I did an instrumented build of clang++ with
cmake -S clang -B build-clang -G Ninja \
-DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG -finstrument-functions -pg" \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCLANG_BUILD_EXAMPLES=1 \
-DLLVM_PARALLEL_LINK_JOBS=1 \
-DLLVM_PARALLEL_COMPILE_JOBS=8 \
-DLLVM_INCLUDE_TESTS=OFF
in the hopes of using uftrace but it didn't work. I also tried
valgrind --tool=callgrind /usr/local/bin/clang++ -DQT_CORE_LIB -DQT_GUI_LIB -DQT_WIDGETS_LIB -I/v3c/Qt/build-clang-pch-Desktop_Qt_6_5_0_clang_64bit-Debug/clang-pch_autogen/include -isystem /v3c/Qt-install/6.5.0/gcc_64/include/QtCore -isystem /v3c/Qt-install/6.5.0/gcc_64/include -isystem /v3c/Qt-install/6.5.0/gcc_64/mkspecs/linux-g++ -isystem /v3c/Qt-install/6.5.0/gcc_64/include/QtWidgets -isystem /v3c/Qt-install/6.5.0/gcc_64/include/QtGui -DQT_QML_DEBUG -g -fPIC -std=gnu++17 -Winvalid-pch -Xclang -include-pch -Xclang /v3c/Qt/build-clang-pch-Desktop_Qt_6_5_0_clang_64bit-Debug/CMakeFiles/clang-pch.dir/cmake_pch.hxx.pch -Xclang -include -Xclang /v3c/Qt/build-clang-pch-Desktop_Qt_6_5_0_clang_64bit-Debug/CMakeFiles/clang-pch.dir/cmake_pch.hxx -MD -MT CMakeFiles/clang-pch.dir/main.cpp.o -MF CMakeFiles/clang-pch.dir/main.cpp.o.d -o CMakeFiles/clang-pch.dir/main.cpp.o -c /v3c/Qt/clang-pch/main.cpp
which worked but I can only step out of the open()
call one level at a time, and I'm betting that will take too long.
I tried using kdbg
and putting a breakpoint at llvm-project/llvm/lib/Support/MemoryBuffer.cpp line 526
ErrorOr<std::unique_ptr<MemoryBuffer>>
MemoryBuffer::getOpenFile(sys::fs::file_t FD, const Twine &Filename,
uint64_t FileSize, bool RequiresNullTerminator,
bool IsVolatile, std::optional<Align> Alignment)
but it looks like gdb
isn't up to it.
I also tried Qt Creators start and debug external application...
but once again, gdb
seems to be the problem.
What's the process for debugging clang++?
Of course it would be quicker if someone just told me where to look.
I needed a vanilla debug build of clang to step through it.
Compiler reads in PCH
=====================
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/MemoryBuffer.cpp 527 stack Frame #0
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp 231 stack Frame #1
/v3c/Qt/llvm/llvm-project/clang/lib/Basic/FileManager.cpp 555 stack Frame #2
/v3c/Qt/llvm/llvm-project/clang/lib/Serialization/ModuleManager.cpp 212 stack Frame #3
/v3c/Qt/llvm/llvm-project/clang/lib/Serialization/ASTReader.cpp 4584 stack Frame #4
/v3c/Qt/llvm/llvm-project/clang/lib/Serialization/ASTReader.cpp 4308 stack Frame #5
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp 668 stack Frame #6
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp 624 stack Frame #7
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/FrontendAction.cpp 980 stack Frame #8 <= *** HERE ***
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp 1052 stack Frame #9
/v3c/Qt/llvm/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp 272 stack Frame #10
/v3c/Qt/llvm/llvm-project/clang/tools/driver/cc1_main.cpp 249 stack Frame #11
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 366 stack Frame #12
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 506 stack Frame #13
/usr/local/include/llvm/ADT/STLFunctionalExtras.h 45 stack Frame #14
/usr/local/include/llvm/ADT/STLFunctionalExtras.h 68 stack Frame #15
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp 440 stack Frame #16
/usr/local/include/llvm/ADT/STLFunctionalExtras.h 45 stack Frame #17
/v3c/Qt/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h 68 stack Frame #18
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp 426 stack Frame #19
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp 440 stack Frame #20
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp 199 stack Frame #21
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp 253 stack Frame #22
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Driver.cpp 1903 stack Frame #23
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 542 stack Frame #24
/v3c/Qt/llvm/llvm-project/build-clang-gcc-debug/tools/driver/clang-driver.cpp 15 stack Frame #25
Compiler reads in source file
=============================
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/MemoryBuffer.cpp 527 stack Frame #0
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp 231 stack Frame #1
/v3c/Qt/llvm/llvm-project/clang/lib/Basic/FileManager.cpp 555 stack Frame #2
/v3c/Qt/llvm/llvm-project/clang/lib/Basic/SourceManager.cpp 117 stack Frame #3
/v3c/Qt/llvm/llvm-project/clang/include/clang/Basic/SourceManager.h 1028 stack Frame #4
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CGDebugInfo.cpp 357 stack Frame #5
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CGDebugInfo.cpp 561 stack Frame #6
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CGDebugInfo.cpp 76 stack Frame #7
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CodeGenModule.cpp 403 stack Frame #8
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/ModuleBuilder.cpp 166 stack Frame #9
/v3c/Qt/llvm/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp 217 stack Frame #10
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp 193 stack Frame #11
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/FrontendAction.cpp 999 stack Frame #12 <= *** HERE ***
/v3c/Qt/llvm/llvm-project/clang/lib/Frontend/CompilerInstance.cpp 1052 stack Frame #13
/v3c/Qt/llvm/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp 272 stack Frame #14
/v3c/Qt/llvm/llvm-project/clang/tools/driver/cc1_main.cpp 249 stack Frame #15
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 366 stack Frame #16
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 506 stack Frame #17
/usr/local/include/llvm/ADT/STLFunctionalExtras.h 45 stack Frame #18
/usr/local/include/llvm/ADT/STLFunctionalExtras.h 68 stack Frame #19
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp 440 stack Frame #20
/usr/local/include/llvm/ADT/STLFunctionalExtras.h 45 stack Frame #21
/v3c/Qt/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h 68 stack Frame #22
/v3c/Qt/llvm/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp 426 stack Frame #23
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Job.cpp 440 stack Frame #24
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp 199 stack Frame #25
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Compilation.cpp 253 stack Frame #26
/v3c/Qt/llvm/llvm-project/clang/lib/Driver/Driver.cpp 1903 stack Frame #27
/v3c/Qt/llvm/llvm-project/clang/tools/driver/driver.cpp 542 stack Frame #28
/v3c/Qt/llvm/llvm-project/build-clang-gcc-debug/tools/driver/clang-driver.cpp 15 stack Frame #29
Now to figure out how to construct a new source file entry or reuse the existing one.
Before hearing that ORCv2 prevented defining the same function more than once, I had an idea about how to get around the problem, for example in Kaleidoscope. Before creating the function, the process would fork, inheriting all the functions previously defined, sharing their memory. The function definition would be entered as usual, but if the user wanted to redefine it, the program would end the child process(es) that defined it and fork a new process with the new definition.
Thus only the change requires code generation, all the rest is already in memory.
Then it struck me. Compilers work by reading files and creating an in-memory representation, Pre-compiled headers, even if memory mapped, need to page in data from disk.
But what if you stopped the compilation process after reading in all the headers but just before you read the source file, forked it, and for each fork, read in a source file, generated the object code, and exited.
This turns the compiler into an interpreter.