rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.12k stars 463 forks source link

Profile-guided dead code elimination #958

Open rui314 opened 1 year ago

rui314 commented 1 year ago

This is another crazy optimization idea that the traditional linker doesn't provide.

Applications tend to contain lots of functions that will never be executed at runtime, though they look reachable by the static analysis. Reachability-based garbage collection cannot eliminate such functions from the output.

But, what if the test code coverage is 100%? Then, any functions that weren't executed during testing will (by definition) never be executed in production, so they are safe to be removed from the output. We could replace them with a small stub function that just says "unreachable code is executed" and exit.

This optimization has a potential to eliminate lots of code from an application.

This is also an unsafe optimization, as it is hard to achieve 100% test coverage. I believe there are a few ways to mitigate the risk:

Application size matters even in today's computing environment. For example, I heard that application installation success rate directly correlates to its binary size on mobile. So any idea that can significantly reduce the binary size has a big potential.

Edit: The other idea to be on the safe side is to apply this optimization only to statically-linked libraries. The rationale is that users don't write dead code for their applications, but the libraries the app is linked to tend to bring in lots of unnecessary code.

yerke commented 1 year ago

What are the advantages of making this optimization at link vs compile stage?

rui314 commented 1 year ago

If you do it at the compile stage, you need to rebuild all object files for a particular configuration, which isn't always easy. For example, if the dynamic analysis tool find that some function in libz.a is not used in production, you don't generally want to libz.a, I guess.

mbj commented 1 year ago

This is also an unsafe optimization, as it is hard to achieve 100% test coverage.

What exactly would you measure for using that 100% as a target? All statements in the source language have a semantic effect?

Alcaro commented 1 year ago

If 100% of statements in the source language are reached, then 100% of code is live, there's nothing dead to optimize out.

If changed to "every function has either 0% or 100% coverage of source language statements", your definition means something, but you can get false negatives from arrays of function pointers where only some indices are called. And it's difficult to get 100% coverage of assert() calls unless you include a bunch of intentionally-buggy callers.

Weakening it to "functions that are not referenced by anything can be deleted" would be safe, but it also exists already; the linker won't extract unused object files from a .a, and if you want greater granularity, you can use -ffunction-sections -Wl,--gc-sections. And I'm not sure if linker can remove unused functions from a .o without -ffunction-sections.

Certainly an interesting idea, though.

kripken commented 1 year ago

Our experience in the WebAssembly space may be relevant here. We've investigated this type of idea because wasm binary size is very important on the web.

Our current solution is wasm-split (credit: @tlively) which is also integrated in the Emscripten toolchain. Basically, users run their application in a "profiling" mode that notes which code is reached. We then split the binary into two parts: the first that is loaded immediately, and the second which contains code we think is unused, and is only downloaded and run on demand.

We chose that design because in practice it is too risky to fail entirely if the code we thought was unused ends up used. At least, that is the case for the large and complex applications we are focused on. That is, we ended up not trusting code coverage completely, but there are still substantial benefits to splitting out the unused code: the main application starts up more quickly, and if the second module is never used, we save memory.

I don't know enough about your goals here to know if such splitting in mold could make sense. In principle, though, you can dlopen the second module on demand, or something like that. That would not save binary size on disk, but it would save it in memory, at least.

rui314 commented 1 year ago

Separating unreachable code as a separate .so is definitely doable, and that's probably one should do to implement this feature.

The other crazy, more wild idea is to just ignore errors, which is often referred to as "failure oblivious computing". With this approach, we'd replace an unreachable function with a function that just returns without reporting an error or doing anything to recover from the error. That's not a correct way to fix the problem, but it at least keeps the program going. It's found that surprisingly large number of errors can be ignored for some type of errors [1].

[1] https://www.usenix.org/legacy/event/osdi04/tech/full_papers/rinard/rinard.pdf

DanxiongLei commented 1 month ago

@rui314 @kripken I am also looking for a solution to perform module split in C++. Is there any progress here?

rui314 commented 1 month ago

No one is actively working on this item as far as I know, but what kind of module split are you looking for?

DanxiongLei commented 3 weeks ago

We develop games using Unity and Unreal, and through our experiments with module splitting on WASM, we found that at least half of the code is never executed. Even if it is executed, we can resolve this with "dynamic delivery sub-wasm". We have validated this in a production environment for a year. Now, we hope to do the same on the .SO platform, expecting to reduce the code size by half.

rui314 commented 3 weeks ago

How did you split wasm file into multiple sub-wasm files? We might be able to do the same.

DanxiongLei commented 3 weeks ago

We performed module splitting based on @kripken : Emscripten Module Splitting. The principle is Profile Guided Optimization (PGO). Specifically, we input a WASM file and instrument each function. After running the instrumented WASM, we record which functions are used and which are not, collecting a profile. Based on this profile, we split the original WASM into main.wasm and sub.wasm.

DanxiongLei commented 2 weeks ago

What do you think of this pattern? Is it feasible in C++?

rui314 commented 2 weeks ago

I think it is technically doable, but someone needs to write a wasm-split-equivalent tool for the native code.