llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.14k stars 12.02k forks source link

[LTO] Private symbol order reversed during IR linking with resolution-based API #31788

Open llvmbot opened 7 years ago

llvmbot commented 7 years ago
Bugzilla Link 32441
Version trunk
OS Linux
Attachments Test case
Reporter LLVM Bugzilla Contributor
CC @joker-eph,@pcc

Extended Description

I'm seeing an issue with customer code that relies on the order of private symbols (strings), used as initializers for global variables, being preserved during LTO. The old LTO API does this, the resolution-based API reverses them. I'm not sure what the C standard says about this, but it just seems arbitrary to do this.

I've tracked down the root of the issue to ModuleSymbolTable::addModule, so Peter I'd appreciate your comments on this.

In the following, I'm referring to the trivial test case attached to this bug. It defines two globals, bar1 and bar2, which are initialized with strings (.str and .str.1). There is a function foo which uses them in the order bar1 and then bar2 (this is important).

Let's first run this through the old API: llvm-lto privateorder.bc -exported-symbol=bar1 -exported-symbol=bar2 -exported-symbol=foo -save-merged-module -o privateorder.o

In ModuleLinker::run(), the 'kept' globals are added in the order variables-functions-aliases (here: bar1, bar2, foo). The IRMover materializes them in this order from its worklist, and all is good. The merged module (privateorder.o.merged.bc) has @.str and @.str1 in the correct order.

Now if I run this through the new API things are different in privateorder.o.0.0.preopt.bc: llvm-lto2 privateorder.bc -r privateorder.bc,bar1,px -r privateorder.bc,bar2,px -r privateorder.bc,foo,px -r privateorder.bc,puts, -o privateorder.o -save-temps

ModuleSymbolTable::addModule() builds the list of symbols in the module by iterating over them in the order of functions-variables-aliases, which in this case leads to the Keep list being [foo,bar1,bar2] in LTO::addModule. The IRMover now materializes foo first, which triggers its dependences bar1 and bar2 to be scheduled for materialization by the Mapper, which in turn triggers the strings (which will be processed in reverse by the Mapper). When the IRMover reaches bar1 and bar2 in its worklist, they and their initializers are already materialized.

Now, the easy fix appears to be to change the order in ModuleSymbolTable::addModule() from functions-variables-aliases to variables-functions-aliases. I then see the exact same materialization order as with the old API.

My question for you all is: Would this be safe/correct? Was there a particular reason why a different iteration order was chosen for the ModuleSymbolTable? I feel that the convention across LLVM tends to be variables then functions, rather than the other way around.

Thanks, Tobias

llvmbot commented 7 years ago

If you want to write an optimization that reorders data, it would be awesome. It should probably go on the linker where we see a bigger part of the code and data.

We would love to see that too ;) It's definitely on our agenda after we're done with function placement. And yes, 100% agree that (both in fact) make most sense in the linker (with PGO input).

llvmbot commented 7 years ago

Facebook published https://research.fb.com/publications/optimizing-function-placement-for-large- scale-data-center-applications-2/ about a linker-script solution for optimizing function placement based on PGO data for instance (their tool is open-source).

Yep, Guilherme's paper is great. It doesn't address data, though.

But the point that this is an optimization remains. If you wan to write an optimization that reorders data, it would be awesome. It should probably go on the linker where we see a bigger part of the code and data.

llvmbot commented 7 years ago

Addressing the cache problem is in my opinion not to be addressed by preserving the source order in any way (which is anyway not really the case with LTO), but implementing an optimization that would be cache aware.

There is no need to intentionally pessimize it for the LTO case vs. the non-LTO case (and the previous LTO API), though. It happens even at LTO -O0.

Facebook published https://research.fb.com/publications/optimizing-function-placement-for-large- scale-data-center-applications-2/ about a linker-script solution for optimizing function placement based on PGO data for instance (their tool is open-source).

Yep, Guilherme's paper is great. It doesn't address data, though.

joker-eph commented 7 years ago

Addressing the cache problem is in my opinion not to be addressed by preserving the source order in any way (which is anyway not really the case with LTO), but implementing an optimization that would be cache aware.

Facebook published https://research.fb.com/publications/optimizing-function-placement-for-large-scale-data-center-applications-2/ about a linker-script solution for optimizing function placement based on PGO data for instance (their tool is open-source).

llvmbot commented 7 years ago

OK, I thought you were mentioning about the initialization at runtime of the code sample you showed. There is no correctness issue right? We're not miscompiling anything?

Exactly. No correctness issue for standards-compliant code, as far as I can see. It will have a performance impact due to the cache effects. That's really the bigger problem.

joker-eph commented 7 years ago

OK, I thought you were mentioning about the initialization at runtime of the code sample you showed. There is no correctness issue right? We're not miscompiling anything?

llvmbot commented 7 years ago

Note how it's just completely jumbled up because materialization is triggered by the order that variables get used in the function (with the first level of indirection and down getting reversed).

I don't understand what you mean here?

The IRMover materializes the function foo first, which triggers its dependences to be scheduled for materialization, which in turn triggers their dependences. It all happens inside the FlushingValueMapper. In essence, the mapper is now primarily determining the order variables get materialized as opposed to the IRMover's worklist (which is based on module order).

llvmbot commented 7 years ago

Sure. I agree with you all that this is clearly unspecified by the standards. I also agree that - ideally - they should just change their code. This was certainly my first reaction, too. We all know though that, in practice, that's something that can't always be done quickly or easily; anyway, that's besides the point.

No, that is precisely the point. We all get internal customers that were depending on arbitrary decisions. The correct thing to do is to help the customer avoid remove that dependency, not ask llvm to freeze an arbitrary decision.

joker-eph commented 7 years ago

Note how it's just completely jumbled up because materialization is triggered by the order that variables get used in the function (with the first level of indirection and down getting reversed).

I don't understand what you mean here?

llvmbot commented 7 years ago

Sure. I agree with you all that this is clearly unspecified by the standards. I also agree that - ideally - they should just change their code. This was certainly my first reaction, too. We all know though that, in practice, that's something that can't always be done quickly or easily; anyway, that's besides the point.

I've since done some more digging and it turns out this doesn't only happen to string literals. It happens to global variables as well.

Take this code:

int var1 = 0; int var2 = 0; int var3 = &var1; int var4 = &var2;

int foo() { return var3 + var4; }

In non-LTO mode, and with the old API, you'll get the following ordering: [var1, var2, var3, var4, foo]. The new API will give you: [var3, var4, var2, var1, foo]. Note how it's just completely jumbled up because materialization is triggered by the order that variables get used in the function (with the first level of indirection and down getting reversed). Again, of course, the standard makes no guarantee but I wouldn't be surprised if there's even more code out there that relies on global variables (rather than string literals) appearing in program order.

Most importantly, however, this will have cache effects. By reversing variables and string literals, we destroy the locality that was previously there. It's not great for strings, it'll be even worse for variables.

Sure, we don't want to be locked down to having to guarantee program order. But why change it when there is no reason to?

I'll push a patch to phab so we can do more testing. I don't see any lit tests breaking.

joker-eph commented 7 years ago

If it's arbitrary, I don't see why we need to break previous behavior intentionally with no conceivable gain when the fix is literally trivial.

I'm not against changing the order here to match the previous one, but I'm very cautious in general about being locked-down on things we don't want to be locked-down on just because clients start relying on implementation details.

So, just like anything not defined, in general, I'd go to my internal (or external) client and told them to not rely on this (irrespective if we change back or not the order upstream).

llvmbot commented 7 years ago

Any reason why you chose the ordering functions-variables-aliases over variables-functions-aliases in ModuleSymbolTable though? If it's arbitrary, I don't see why we need to break previous behavior intentionally with no conceivable gain when the fix is literally trivial. If it was done for a good reason, it would of course make sense to discuss alternative options.

If something is arbitrary, changing it is not breakage.

llvmbot commented 7 years ago

Any reason why you chose the ordering functions-variables-aliases over variables-functions-aliases in ModuleSymbolTable though? If it's arbitrary, I don't see why we need to break previous behavior intentionally with no conceivable gain when the fix is literally trivial. If it was done for a good reason, it would of course make sense to discuss alternative options.

pcc commented 7 years ago

I agree with Rafael. If you are seeing programs that truly rely on the ordering of string literals, you most likely will need to fix those programs or (as a last resort) create a dialect of C for your target that provides guarantees about string ordering. I think it should be possible to implement such a dialect by changing only the compiler (basically you would merge all string literals into a single GlobalVariable and GEP into it).

llvmbot commented 7 years ago

I don't think the order of private symbol has semantic value, so this is not a bug IMHO.