Closed bollu closed 9 months ago
The, hopefully, last remaining error is quite puzzling: this works
"D:/a/lean4/lean4/build/stage1/leanc.sh" D:/a/lean4/lean4/build/stage1/shell/CMakeFiles/shell.dir/lean.cpp.obj D:/a/lean4/lean4/build/stage1/shell/CMakeFiles/shell.dir/manifest.rc.obj -lleanshared -O3 -DNDEBUG -o D:/a/lean4/lean4/build/stage1/bin/lean.exe
but this fails:
D:/a/lean4/lean4/build/stage1/leanc.sh -o "D:/a/lean4/lean4/build/stage1/bin/leanc.exe" D:/a/lean4/lean4/build/stage1/temp/Leanc.o -O3 -DNDEBUG -lleanshared -v
"D:/a/lean4/lean4/build/stage1/bin/ld.lld" --sysroot=D:/a/lean4/lean4/build/stage1 -m i386pep -Bdynamic -o D:/a/lean4/lean4/build/stage1/bin/leanc.exe D:/a/lean4/lean4/build/stage1/lib/crt2.o D:/a/lean4/lean4/build/stage1/lib/crtbegin.o -LD:/a/lean4/lean4/build/stage1/lib -LD:/a/lean4/lean4/build/stage1/lib/lean -LD:/a/lean4/lean4/build/stage1/x86_64-w64-mingw32/lib -LD:/a/lean4/lean4/build/stage1/x86_64-w64-mingw32/mingw/lib -LD:/a/lean4/lean4/build/stage1/lib -LD:/a/lean4/lean4/build/stage1/x86_64-w64-mingw32/sys-root/mingw/lib -LD:/a/lean4/lean4/build/stage1/lib/clang/15.0.1/lib/windows D:/a/lean4/lean4/build/stage1/temp/Leanc.o -lleanshared -Bstatic -lgmp -lunwind -Bdynamic -lgmp -lucrtbase --stack 104857600 -lm -lbcrypt -lmingw32 D:/a/lean4/lean4/build/stage1/lib/clang/15.0.1/lib/windows/libclang_rt.builtins-x86_64.a -lmoldname -lmingwex -lpthread -ladvapi32 -lshell32 -luser32 -lkernel32 -lmingw32 D:/a/lean4/lean4/build/stage1/lib/clang/15.0.1/lib/windows/libclang_rt.builtins-x86_64.a -lmoldname -lmingwex -lkernel32 D:/a/lean4/lean4/build/stage1/lib/crtend.o
ld.lld: error: undefined symbol: l_Lean_Compiler_FFI_getCFlags
>>> referenced by D:/a/lean4/lean4/build/stage1/temp/Leanc.o:(l_main___lambda__4)
...but the command lines at least should be identical on master
!bench
Due to new commits on master
, we have a new test failure because of new warnings:
[06:57 (00.000005)] lean-test-stage1> > /nix/store/h1sl8hi6wp2i435d06cwpz9y3dhmdiil-lean-bin-tools/include/lean/lean.h:1942:79: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
[06:57 (00.000006)] lean-test-stage1> > static inline lean_object* lean_expr_checker_get_max_ctor_fields(lean_object *) {
[06:57 (00.000005)] lean-test-stage1> > ^
...
We could work on filtering out these warnings but we should probably instead make sure the header is valid C99(?) /cc @bollu
Let's please also use the correct lean_object *
typedef
s in these functions
Here are the benchmark results for commit 74fe86fb9079d1c0046700146c4a202ba55156bb. There were significant changes against commit 367b38701fb9cca80328624a016c3b3acfd5e2cd:
Benchmark Metric Change
============================================================
+ binarytrees instructions -1.1% (-1289.9 σ)
+ binarytrees.st instructions -1.1% (-3141.3 σ)
- liasolver maxrss 3.6% (180.1 σ)
- qsort maxrss 3.6% (180.1 σ)
- qsort task-clock 1.4% (10.4 σ)
- qsort wall-clock 1.4% (10.1 σ)
- rbmap task-clock 7.9% (14.7 σ)
- rbmap wall-clock 7.8% (14.7 σ)
+ rbmap_fbip branches -1.7% (-3896.7 σ)
- tests/bench/ interpreted instructions 2.9% (6167.1 σ)
- unionfind task-clock 4.1% (12.5 σ)
- unionfind wall-clock 4.1% (12.4 σ)
Interesting, this is the diff with LLVM disabled. But since especially rbmap
should not use any of the outlined functions, it is hard to see any causal relationship with this PR. I will turn LLVM back on for now and do a run with it.
!bench
Here are the benchmark results for commit da47e71de1a0d744cb4c21ab115a179c6f6196cf. There were significant changes against commit 367b38701fb9cca80328624a016c3b3acfd5e2cd:
Benchmark Metric Change
============================================================
+ binarytrees instructions -1.1% (-756.7 σ)
+ binarytrees.st instructions -1.1% (-296.2 σ)
- liasolver maxrss 3.6% (178.1 σ)
- qsort maxrss 3.6% (185.1 σ)
- rbmap task-clock 7.6% (26.7 σ)
- rbmap wall-clock 7.6% (27.1 σ)
+ rbmap_fbip branches -1.7% (-1542.6 σ)
- tests/bench/ interpreted instructions 2.9% (9562.3 σ)
My bad, the benchmarking setup does not actually use LLVM yet...
Nice, looks like it's only missing visibility modifiers on initializers (https://github.com/leanprover/lean4/actions/runs/5642004685/job/15281100057?pr=2340). @bollu Could you fix that and then put the Emit changes in a separate PR so we can do an update-stage0 on master in between?
@Kha I assume I need an update-stage0 for the windows build? I’m not sure what’s up with the Linux release lake failure.
Hmm, the test might be flaky but twice in a row and never before is a bit concerning. Anyways, if Windows passes, please go ahead with the Emit PR
I cannot reproduce the failures on my local setup :(
@Kha I asked @hargoniX to rerun the linux tests on their machine, where it did succeed. Could we rerun the linux release CI? I would be troubled if leanlaketest_116
fails once more.
@Kha I squash-merged this branch against the latest upstream/master
. I resolved the stage0
changes by using the versions of the files in upstream/master
. This build fails on aarch64
: https://github.com/bollu/lean4/actions/runs/5660012113/job/15334806801.
I guess we need a second stage0
update with our CMake
changes to propagate the CMake
changes which correctly setup --target
?
(To be clear, I would like to understand how to structure this PR so we can start reviewing it!)
I squash-merged this branch against the latest
upstream/master
I would prefer independent parts to stay as separate commits; it is fine to squash away incremental work internal to the PR. You can manually copy src/CMakeLists.txt
to stage0/
which I believe should fix CI, it is only the standard library that we always copy with the script.
I don't have a good enough mental model of the CMake build to successfully disambiguate all the changes we've made :D I can split this up into three unrelated commits: (1) nix changes, (2) outlining, (3) "build system changes" [all the changes to CMake
plus the backporting into stage0
]. Hope this works for you!
Maybe the easiest is to see if merging master (while kipping the stage0 from master), plus copying src/CMakeLists.txt to stage0 already reduces the size of this PR quite a bit?
@Kha can we have a bench to make sure we do not regress performance?
I am hoping that https://github.com/leanprover/lean4/pull/2358 will alleviate the flakiness of the Linux release build.
!bench
Here are the benchmark results for commit c8a169a4df683b6f47bd0e10d26b15f3944b9a16. There were significant changes against commit 53477089fe255280d7a816371ec315c6e2e94220:
Benchmark Metric Change
================================================================
+ binarytrees instructions -1.1% (-55.4 σ)
+ binarytrees.st instructions -1.1% (-3280.0 σ)
+ rbmap_fbip branches -1.7% (-3162.0 σ)
+ rbmap_fbip task-clock -6.8% (-10.1 σ)
+ rbmap_fbip wall-clock -6.8% (-10.1 σ)
+ stdlib tactic execution -1.6% (-13.7 σ)
+ stdlib task-clock -1.4% (-17.1 σ)
- tests/bench/ interpreted instructions 2.9% (5962.0 σ)
- unionfind task-clock 3.9% (78.2 σ)
- unionfind wall-clock 3.8% (75.1 σ)
Oh jeez, the larger timeout actually fixed it... that points to a significant performance regression compared to the C backend then though
@Kha / @bollu
Oh jeez, the larger timeout actually fixed it... that points to a significant performance regression compared to the C backend then though
Hopefully, the timeout only really needed a 1 second bump and, if so, the performance impact is not so extreme (and there is a good chance that 3 seconds may have already been just barely enough).
Also, for the return values of the extern functions. you may wish to use lean_obj_res
rather than lean_object*
to follow the usual style.
Before we merge, we will have to switch back to the C backend anyway. I would suggest creating a copy of the "Linux release" CI job and set LLVM=ON for it and only it for now. We can disable the test for that specific CI job then and consider it part of future performance investigations of the LLVM backend.
But also we should fix that test to not require a timeout :)
@Kha I would like to cover the following triples:
MacOS aarch64
).Windows
). Let us haggle on which of these we should have a LLVM=ON
build copy 😄
We already have too many CI jobs that we run on each PR. If you want to see the backend be tested for each configuration, let's work on making it the default :) .
That's a good incentive!
We already have too many CI jobs that we run on each PR. If you want to see the backend be tested for each configuration, let's work on making it the default :) .
@Kha that sounds good to me. May I know what would be necessary to flip on LLVM by default across all backends? From the milestone list (https://github.com/leanprover/lean4/milestone/3) , I presume that what is missing is:
(1) pulling in the patch that correctly initializes size_t
. We can test this on a non 64-bit target in a VM after pulling in https://github.com/leanprover/lean4/pull/2090. This should also fix https://github.com/leanprover/lean4/issues/2085.
(2) making the LLVM backend have performance parity across all tests. I would like to know how to run the benchmark suite locally, and how I could gather a list of tests where we see a performance drop? From the current !bench
, I can see that tests/bench/
and unionfind
need to be investigated:
Here are the benchmark results for commit c8a169a. There were significant changes against commit 5347708:
Benchmark Metric Change ================================================================ - tests/bench/ interpreted instructions 2.9% (5962.0 σ) - unionfind task-clock 3.9% (78.2 σ) - unionfind wall-clock 3.8% (75.1 σ)
Is that an exhaustive list of milestones before we can flip on LLVM by default?
May I know what would be necessary to flip on LLVM by default across all backends?
Ideally it should be strictly better than the C backend! I.e. no known regressions and at least some notable benefits. One minor benefit would be that we shouldn't have to distribute libclang anymore, which is about 7% of the unpacked toolchain. Unless, that is, the ecosystem started depending on there being a C compiler, which I haven't checked.
I would like to know how to run the benchmark suite locally, and how I could gather a list of tests where we see a performance drop?
Yes, running the Speedcenter suite locally with and without LLVM is the way to go for now I believe. Please tell me if there is anything missing from https://github.com/leanprover/lean4/blob/master/tests/bench/README.md. Note that, as I said, !bench
is not yet set up to use LLVM at all.
Is that an exhaustive list of milestones before we can flip on LLVM by default?
I added everything that directly came to mind to the milestone
The thing that is making the CI fail on aarch64 is fixed in the current shell.cpp
, I guess we need an update-stage0 for this?
The weak
workaround has yet to be reinstated. While doing that, we should document what it does and why it is necessary. I would also welcome opinions on whether there are alternative approaches to it, it is quite a departure from the C backend after all and the repercussions are not clear to me.
edit: for the sake of completeness, my original message on this topic:
We removed the
static
from these functions in lean.h.bc, but in order to sucessfully link together resulting object files, we must not expose the functions as global symbols (i.e. make sure they are eliminated or make them hidden or weak)/nix/store/fx1h3mmzl2hjci3nn15n26r57mc3cs3z-binutils-2.40/bin/ld: /home/sebastian/lean/lean/build/release/stage2/lib/lean/libLean.a(Widget.o): in function `lean_set_st_header': Lean.Widget:(.text+0x4b0): multiple definition of `lean_set_st_header'; /home/sebastian/lean/lean/build/release/stage2/lib/lean/libInit.a(EState.o):Init.Control.EState:(.text+0xdf0): first defined here
We removed the
static
from these functions in lean.h.bc, but in order to sucessfully link together resulting object files, we must not expose the functions as global symbols (i.e. make sure they are eliminated or make them hidden or weak)/nix/store/fx1h3mmzl2hjci3nn15n26r57mc3cs3z-binutils-2.40/bin/ld: /home/sebastian/lean/lean/build/release/stage2/lib/lean/libLean.a(Widget.o): in function `lean_set_st_header': Lean.Widget:(.text+0x4b0): multiple definition of `lean_set_st_header'; /home/sebastian/lean/lean/build/release/stage2/lib/lean/libInit.a(EState.o):Init.Control.EState:(.text+0xdf0): first defined here
I guess we want to set the function linkage type to
internal
then:
https://llvm.org/docs/LangRef.html#linkage-types
Similar to private, but the value shows as a local symbol (STB_LOCAL in the case of ELF) in the object file. This corresponds to the notion of the ‘static’ keyword in C.
These are the relevant functions: https://llvm.org/doxygen/GlobalValue_8h_source.html#l00532
Sounds good. At which step of the pipeline should we do that, after loading the clang-generated bitcode but before linking it with the Lean backend output? I assume the overhead of redoing this (in memory) for every Lean file should be negligible.
Sounds good. At which step of the pipeline should we do that, after loading the clang-generated bitcode but before linking it with the Lean backend output? I assume the overhead of redoing this (in memory) for every Lean file should be negligible.
I am not sure I have enough context to answer this. Why did we remove the static in the first place?
The unfortunate problem is that clang will not generate any bitcode at all if we give it lean.h as is with static inline
s. So we remove the static
for generating lean.h.bc and now just have to revert that change later somewhere on the bitcode level.
The unfortunate problem is that clang will not generate any bitcode at all if we give it lean.h as is with
static inline
s. So we remove thestatic
for generating lean.h.bc and now just have to revert that change later somewhere on the bitcode level.
I see. As a workaround, we could run 'opt -internalize' on the resulting .bc file, assuming all functions should be internalized.
That sounds handy, but the docs seem to imply it doesn't do anything if there is no main
? https://llvm.org/docs/Passes.html#internalize-internalize-global-symbols
That sounds handy, but the docs seem to imply it doesn't do anything if there is no
main
? https://llvm.org/docs/Passes.html#internalize-internalize-global-symbols
I teste it and it works even if there is no main function.
Awesome, let me try that right now
Heh, perhaps not unexpectedly that already breaks linking in the header with the backend output. So we would have to selectively internalize these symbols after this linking step.
There is a clang option named -femit-all-decls
that should force clang to also emit static inline
functions. Is this maybe what we want?
That's also good to know but it probably won't solve the issue of LLVMLinkModules2
ignoring internal symbols, no?
I thought about this a little more. LLVMLinkModules2
does not expose functionality to internalize definitions while linking and the high-level C++ interface used in llvm-linker also requires a callback that does the actual internalization:
L.linkInModule(
std::move(M), ApplicableFlags, [](Module &M, const StringSet<> &GVS) {
internalizeModule(M, [&GVS](const GlobalValue &GV) {
return !GV.hasName() || (GVS.count(GV.getName()) == 0);
});
});
Hence, our best bet would probably after calling LLVMLinkModule2 to iterate over the resulting module and call void LLVMSetLinkage(LLVMValueRef Global, LLVMLinkage Linkage);
with LLVMExternalLinkage
on any function that is also contained in the llvm.h.bc file?
@bollu is on it, so let's see if that works.
Here is the lean.h.bc
file: https://gist.github.com/bollu/5700b2b693bfe1de2addaa2957699f6d
I must admit, I find the failure quite puzzling. My best guess is that I'm using the API wrong, so I'm going to write a small test C++/python script to check that we are indeed using the API correctly to grab functions in the module.
Hey @Siddharth, I feel that we first need to revert @Kha's patch that internalizes the functions before linking. We meanwhile learned that this will prevent any of them to be linked into the module. Second, .str
is private so likely won't be linked. Did you try to just drop the exception and continue even if some symbols are not found?
This PR changes the build system and outlines uses of
extern inline
to enable LLVM stage2 builds. Draft PR till this branch actually succeeds.