llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.3k stars 11.68k forks source link

"-enable-machine-outliner" creates symbols with non-unique names #38874

Open b078d2e7-9c2e-48c7-93df-05e63984c3e0 opened 5 years ago

b078d2e7-9c2e-48c7-93df-05e63984c3e0 commented 5 years ago


Bugzilla Link	39526
Version	trunk
OS	All
CC	@francisvm,@ornata,@MatzeB,@zygoloid,@rnk,@yroux

Extended Description

Notice when compiling Chrome for Arm64 that there are a lot of symbols called "OUTLINED_FUNCTION_1", "OUTLINED_FUNCTION_2", ...

Chrome uses an orderfile to optimize the layout of functions within the binary, but these outlined symbols are ambiguous when listed in an orderfile.

It would be great if the outlined functions could be given unique names (and stable) names. E.g. using a hash of some sort (maybe of the source path).

b078d2e7-9c2e-48c7-93df-05e63984c3e0 commented 5 years ago

Some further findings on this:

Outlined functions do not get instrumented when using -finstrument-function-entry-bare, which Chrome uses to generate an orderfile.

Rather than including outlined functions in instrumentation, it would probably be even better if they were named $ORIGINAL_FUNCTION.outlined, $ORIGINAL_FUNCTION.outlined_1, etc, and then handled by -section-ordering-file to go beside any parent function listed in the file.

llvmbot commented 5 years ago

I noticed that in

https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/ModuleSummaryIndex.h

there's this code:

/// Convenience method for creating a promoted global name /// for the given value name of a local, and its original module's ID. static std::string getGlobalNameForLocal(StringRef Name, ModuleHash ModHash) { SmallString<256> NewName(Name); NewName += ".llvm."; NewName += utostr((uint64_t(ModHash[0]) << 32) | ModHash[1]); // Take the first 64 bits return NewName.str(); }

Would this be something that might be helpful (re. adding hash)?

ornata commented 5 years ago

I think that we can at least improve the stability of the outlined functions by

(1) Generating the names when we actually create the functions rather than early on

and

(2) Sorting the outlined functions by contents

This gets us away from relying on the order of processing outlining candidates for names, at least. Even if we used the numbering-based scheme here, I think we'd be in a better place than before. (Although I'd personally like something more human-readable and debuggable at the end of the day.)

yroux commented 5 years ago

To avoid the processing order issue and keep debug experience in mind, maybe we can include the sorted list of callers in the name, like:

Outlined.bar.foo.1: ...

Having a description of the content (like the list of mnemonics) sounds nice to have, but the names can indeed become gigantic

ornata commented 5 years ago

I think that adding some sort of context is a reasonable solution. However, we'd have to make sure that the order that we process the functions in doesn't change.

It would definitely be nice to actually have unique names for the outlined functions though. If we did that, then we could make the functions linkonceodr and leverage linkers that support deduplication based off function contents.

Another option, which ought to be able to give us a canonical name, might be to name the functions based off of their contents somehow? E.g

foo: some arithmetic stuff bar: the same arithmetic stuff outlined.(something uniquely describing the stuff): the arithmetic stuff

I'm not sure how that could be done without giant names though.

rnk commented 5 years ago

Maybe we can try to reduce the likelihood of non-uniqueness and improve the debug experience by naming the outlined function after the first function it was outlined from. For example, if 'foo' and 'bar' have common code, emit assembly like:

foo: ... bar: ... foo.outlined.1: ... common code

It seems a little more user friendly than a hash, and the '.' separator will allow it to pass through demanglers like c++filt.