rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.13k stars 464 forks source link

Consider doing bootstrap (PGO bootstrap) #250

Open marxin opened 2 years ago

marxin commented 2 years ago

Similar to GCC, mold can easily bootstrap (link mold using already built mold). Plus you can squeeze some extra performance from PGO (-fprofile-generate and -fprofile-use), where linking of mold can be used as a training run. Note PGO plays very well with LTO. What do you think?

rui314 commented 2 years ago

That's an interesting idea. I'm also genuinely interested in how much PGO can improve our linker's performance. I'll experiment it a bit and update this bug later. Thanks!

rui314 commented 2 years ago

I wrote this shell script to link mold with PGO, using mold itself as training data. For some reason, the resulting PGO-enabled mold is slower than non-PGO build by ~10% when building Chrome. This is odd...

#!/bin/bash
set -e

mkdir -p pgo

make clean
make -j EXTRA_CXXFLAGS='-fprofile-instr-generate -O2'
mv mold pgo/mold-stage1
make clean
make -j
rm mold

LLVM_PROFILE_FILE=pgo/a.profraw \
  make -j EXTRA_LDFLAGS='-fuse-ld=`pwd`/pgo/mold-stage1 -Wl,-no-quick-exit'

llvm-profdata merge -output=pgo/a.profdata pgo/a.profraw
make clean

make -j EXTRA_CXXFLAGS='-flto -O2 -fprofile-instr-use=pgo/a.profdata' \
  EXTRA_LDFLAGS='-flto -O2 -fprofile-instr-use=pgo/a.profdata'
marxin commented 2 years ago

Can you please test also GCC-built mold?

ptr1337 commented 2 years ago

Can you please test also GCC-built mold?

I did built it, at arch. But in general most gcc-git builds seems for me completely broken after installation. Maybe its a upstream bug or a bug from the used PKBUILD. Already used around 3-5 different PKGBUILD's for gcc-git, everytime the same error. Even a kernel is not possible to compile after the first seconds.

Will external compiler and not for the host the compiler and i see.

Possible to provide backport patches ?

Regards.

marxin commented 2 years ago

I did built it, at arch. But in general most gcc-git builds seems for me completely broken after installation. Maybe its a upstream bug or a bug from the used PKBUILD. Already used around 3-5 different PKGBUILD's for gcc-git, everytime the same error. Even a kernel is not possible to compile after the first seconds.

You can use the latest stable release 11.2.0. Yes, Linux kernel built is broken with the current master due to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941

But that should not block building mold.

ptr1337 commented 2 years ago

@marxin

I mean to "backport" the patches, which are implemented into the https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/mold-lto-plugin tree. Or are only the two commits from you, releated to mold ?

Currently im running GCC, with the mold patch from 12.0. I just mean, the new implementation's you give for gcc-git.

--- got your last patch into gcc 11, everything good!

marxin commented 2 years ago

I mean to "backport" the patches, which are implemented into the https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/mold-lto-plugin tree. Or are only the two commits from you, releated to mold ?

It's only a single patch. Well, I would recommend building GCC from the source for LTO plug-in integration testing.

ptr1337 commented 2 years ago

I mean to "backport" the patches, which are implemented into the https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/mold-lto-plugin tree. Or are only the two commits from you, releated to mold ?

It's only a single patch. Well, I would recommend building GCC from the source for LTO plug-in integration testing.

Yes, i just going to compile my host compiler fast with the two patches and then the one from the mold lto plugin tree, as external compiler-

ptr1337 commented 2 years ago

I just came over this thread, and thought about giving bolt a try to optimize mold. I dont know much about this could help in the performance but could be a try worth ? What do you think?

rui314 commented 2 years ago

I don't know if we can observe a noticeable difference, but It's worth a try.

ptr1337 commented 2 years ago

I did tried to use the instrument mode from llvm-bolt, without success so far since it seems it does not call directly the "mold" binary which I used for instrumenting.

So i get currently simply no profiles for it. I will test it on a intel machine. But how to check which binary is faster? Do you have a benchmark or something similar ?

rui314 commented 2 years ago

I think if you are going to use the profile of linking mold itself as training data, the obvious benchmark to test its profile-guided optimization is to link mold itself. It may overfit though.

If mold is too small to use as training data, you should use something larger (e.g. LLVM), and you can link the same program again with a profile-guided-optimized mold to see whether PGO works or not.

ptr1337 commented 2 years ago

No, llvm-bolt has a function which feeds the binary with debug data (it grows alot in size). The binary needs to be build with relocations.

If you then run a workload with the instrumented binary you'll get profile which can be used from llvm-bolt to optimize the binary.

rui314 commented 2 years ago

Sorry, what was your problem? I have no experience of using PGO nor BOLT before, so maybe I cannot help you that much.

ptr1337 commented 2 years ago

Sorry, what was your problem? I have no experience of using PGO nor BOLT before, so maybe I cannot help you that much.

Everything good. I will checkout if it works with sampling a profile on a intel machine when i have access to it. That should probably work.

zamazan4ik commented 1 year ago

I wrote this shell script to link mold with PGO, using mold itself as training data. For some reason, the resulting PGO-enabled mold is slower than non-PGO build by ~10% when building Chrome. This is odd...

@rui314 was you able to find the reason, why mold with PGO was slower than mold without PGO?

rui314 commented 1 year ago

I have no idea. Can you reproduce the result? If so, we want to ask PGO developers why.

zamazan4ik commented 1 year ago

I have no idea. Can you reproduce the result? If so, we want to ask PGO developers why.

I can try. The only question is: could I reproduce it using my local Apple Macbook M1 (ARM-based) since you removed the macOS support and moved it to sold?

rui314 commented 1 year ago

macOS support is experimental anyway, so testing PGO with it doesn't make much sense at this moment.

zamazan4ik commented 1 year ago

Tested PGO vs non-PGO mold on linking Clang with ThinLTO - no difference between versions. Also, I tested BOLTed mold vs usual mold - still no measurable effect on linking Clang. Maybe on larger things like Chromium or ClickHouse will be a measurable result...