ldc-developers / ldc

The LLVM-based D Compiler.
http://wiki.dlang.org/LDC
Other
1.2k stars 259 forks source link

pragma(inline, true) should suppress symbol emission #2968

Open TurkeyMan opened 5 years ago

TurkeyMan commented 5 years ago

I noticed some versions of GDC don't inline properly, and DMD doesn't do anything. Can we please confirm that inline in LDC works, then close this issue?

A very simple test; given:

lib.d
-----
module lib;
pragma(inline, true) int fun() { return 10; }
main.d
------
import lib;
int main() { return fun(); }

We can test on latest code: ldc2 -output-s lib.d && cat lib.s <- expect empty binary ldc2 -output-s main.d && cat main.s <- expect binary contains main() and lib.fun() (assuming unoptimised; no inline performed)

(Note: I'm not sure what LDC's cmdline arg for -S is, but you get the idea)

JohanEngelen commented 5 years ago

pragma(inline, true) is not a guarantee that the symbol is not emitted in the module it is in. LDC does not cross-module inline, so emission is needed. Also, the spec doesn't prescribe it so there is no obligation to do so. We do our best to actually inline the function though, but its best effort and not a guarantee either. If you want the symbol to not be emitted in the module it is in, you'd have to make it a template.

dnadlinger commented 5 years ago

(Note: I'm not sure what LDC's cmdline arg for -S is, but you get the idea)

It's -output-s.

expect binary contains main() and lib.fun() (assuming unoptimised; no inline performed)

You'd expect this to be always inlined. Unlike C++ where inline is mostly a visibility modifier (in the object file sense) and a mild optimizer hint, pragma(inline, true) in D is supposed to be the equivalent of __forceinline. Whether it should affect object file emission or not is probably not even documented.

Edit: Whoops, didn't see your comment, Johan.

TurkeyMan commented 5 years ago

pragma(inline, true) is not a guarantee that the symbol is not emitted in the module it is in.

It should also not be eagerly emitted; only emit if there is a local reference. I expect an inline function to be emit when (and where) it's referenced... and if not that semantic, then I don't even know what inline is for?

LDC does not cross-module inline

Is that true? :/ ... is there plan to fix this?

Also, the spec doesn't prescribe it so there is no obligation to do so.

Hmm, the spec appears to be broken. When I initially discussed inline with Walter, what he described is nothing like the spec. It's upsetting, because I had to work for months to convince him that inlining needed to exist, and gave him the design which he bought in to, and then it appeared, and it turns out, it's nothing like we discussed?

We do our best to actually inline the function though

That's good, but it's also not very interesting. Actually definitely inlining the function is not a particularly meaningful semantic. If inline has no semantic meaning, then it's hardly worthy of the pragma that specifies it. inline is about strict control...

you'd have to make it a template.

Can you just mark inline functions with the same bit of magic that causes that behaviour in templates? The whole point of inline is to behave like that and NOT be a template... otherwise it's just a template.

TurkeyMan commented 5 years ago

You'd expect this to be always inlined.

Not necessarily. I'd expect the code to be in the caller's object one way or another; and ideally inline if possible. Obviously the code auto fp = &fun; can't inline... a copy of the function must be emit in this case. But it must be emit locally, otherwise link error!

pragma(inline, true) in D is supposed to be the equivalent of __forceinline

It's not like __forceinline though... C++ absolutely implements the semantics I describe. Inline functions appear in the callers module, and they do NOT appear in their 'home' module if they're not referenced.

Whether it should affect object file emission or not is probably not even documented.

It certainly should be documented... but I guess I just also figured it was common sense. I have no idea what the point of inline is if it doesn't behave the way I say. It should not be the source of link errors under any circumstances... that's self-defeating.

dnadlinger commented 5 years ago

C++ absolutely implements the semantics I describe.

That's the entire point – C++'s and D's semantics as designed are different!

JohanEngelen commented 5 years ago

C++ does what you want because C's inline is not about inlining, it is about linking (necessary with header files). In D, "inline" is about inlining.

TurkeyMan commented 5 years ago

They're not though... if by 'designed' you mean what's written on a spec. But it was me who designed this thing... and it's nothing like intended. I don't know who this is for. Nobody that I'm aware of ever asked for it to be this way.

Anyway, I don't really care about DMD, but it's critical that LDC be correct, since that's the most important compiler in the ecosystem.

TurkeyMan commented 5 years ago

C++ does what you want because C's inline is not about inlining, it is about linking (necessary with header files). In D, "inline" is about inlining.

Ummm, no. inline is code for "it's the caller's business what to do with this function", and a nudge to prefer inlining is a nice bit of sugar. The useful semantic of inlining is as I describe... otherwise it's nothing more than a minor nudge to the optimiser, and that's absolutely not worthy of a pragma direective, and Walter definitely agreed; he argued that inline as a nudge was worthless, and I only convinced him by insisting that it had strong semantics associated, and it be worthy of a pragma... :/

kinke commented 5 years ago

I've always seen this as a pure inlining hint, and I don't see how an inline-pragma would intuitively satisfy Manu's expected semantics ('emit into each referencing other module, either directly inlined or as function').

Not even templates are emitted into each referencing module; the compiler goes through great lengths to emit templates (edit: ideally) only once (one object file), which is well-known to be buggy and may cause linking issues with separate compilation.

TurkeyMan commented 5 years ago

[...] great lengths [...]

You're saying templates are broken too? And we go to great lengths to break it?

TurkeyMan commented 5 years ago

Fortunately, we don't need to go to those great lengths here...

TurkeyMan commented 5 years ago

The linker's really good as its job... there's that choose any attribute that you put on a function, and viola. If it's in multiple objects, the linker just chooses any.

kinke commented 5 years ago

Define broken - AFAIK, Kenji came up with this template culling scheme, and probably was proud to save the compiler and linker from redundant work. Iain said he thinks it'd make more sense to emit them into each referencing object file (not too long ago, somewhere in the forum).

TurkeyMan commented 5 years ago

Well, 'broken' if linking fails because an object that instantiated a template didn't actually emit the symbol it referenced... I can't see how that's anything other than objectively broken.

... but I don't care about templates today. I'm just upset inlining doesn't work ;)

kinke commented 5 years ago

If linking fails, of course, and there's a multitude of bugzilla entries about that. Trying to instantiate/codegen templates only once most likely isn't 'broken' per se; combined with missing cross-module inlining, it may account for suboptimal performance though, and that's why there's LTO to remedy that and similar stuff.

TurkeyMan commented 5 years ago

This inline stuff isn't about performance; it's about link errors.

kinke commented 5 years ago

Care to share some rationale? So you want imported pragma(inline, true) functions to be emitted into each referencing object file - apparently to prevent link errors. So you apparently plan not to compile & link the module containing those functions, why is that?

TurkeyMan commented 5 years ago

[...] instantiate/codegen templates only once [...]

When you say "only once", do you mean once ever? Or only once per compilation unit? I mean, the idea of D compilers is that if you want to improve re-doing compiler work, then pass groups of modules to the compiler together, and it will emit them all to one .o file. I would expect the rules I describe to relate to the compilation unit (ie, compiler invocation/.o file), and not do 'modules'. D modules don't mean anything to the linker.

kinke commented 5 years ago

'Once ever', that's why I wrote 'one object file'. E.g., a user module/object may not contain a template instantiation if the compiler thinks it can prove that a matching instantiation already happened in some Phobos module, and that it'll thus be available in some Phobos object when linking.

TurkeyMan commented 5 years ago

So you apparently plan not to compile & link the module containing those functions

Yes, this is literally the entire point of pragma(inline) as I see it. It should be possible to put an inline function in a .di file.

And in addition, it should NOT be where it was declared, unless something called it locally which caused a copy to be emit locally.

why is that?

Because don't want code in .o files that shouldn't be there (reduce bloat). I don't expect link errors to inline functions. Inline means nothing if it's possible to cause a link error.

One nice side-effect is that inline is a great way to fight against cascading dependencies; eg, consider some inline references some other extern... there is no reference to that extern (and no possible linker issues) until you CALL that inline function. Then the code is generated and emit into the callsite, so the code that owns the dependency is the calling code, not the code that defined the call. This is the proper place for the linkage dependency to exist, and interacts with complex build environments.

My project is full of DLL's and very carefully handled code placement and linkage. It's important to have control over this stuff; there are complex interactions with build environments, link environments, etc, that need mechanisms to control. D is a systems programming language, and this is a pragma we're talking about.

TurkeyMan commented 5 years ago

and that it'll thus be available in some Phobos object when linking.

Talking inline, this is violation of expectation... but talking about templates; how can the compiler determine if a template instantiation is present in libPhobos? How can it not be conservative and emit the symbols where they're needed, just in case? Does it have knowledge of the contents of libPhobos?

kinke commented 5 years ago

Walter or Martin Nowak once said something along these lines: 'Kenji implemented it, and nobody but him understands it.' It has no special treatment for Phobos or whatever, but handles things by looking at the module dependencies via imports. So it depends on what you import and what those modules import etc., and IIRC is also sensitive to the source file order in separate-compilation ( -c or -lib) command lines.

So your inline semantics ('(only) codegen into each referencing compilation unit') would match the intuitive/simple template emission strategy.

TurkeyMan commented 5 years ago

Exactly. And this is the only definition for inline which I find useful... and if not that, then I need the thing I want anyway. I don't know what to call it, and/or it's a terrible smell to wrap everything in a template when it's not; we have enough big ugly mangled symbols everywhere.

This concept is firmly established in other languages, it's called inline, and I thought that's what it means here in D too, especially since I was the one that worked with Walter to have it introduced ;)

dnadlinger commented 5 years ago

And this is the only definition for inline which I find useful...

Really? https://forum.dlang.org/thread/mailman.835.1332024849.4860.digitalmars-d@puremagic.com

This concept is firmly established in other languages, it's called inline

Language, maybe, singular – and even there it's one of the most un-teachable aspects. You might have just been immersed in the same ecosystem for long enough not to even notice.

and I thought that's what it means here in D too, especially since I was the one that worked with Walter to have it introduced

You might just be discovering how malleable memory is, then – have a look at the original DIP: https://wiki.dlang.org/DIP56. It is explicitly documented to be an optimizer flag in D: "Sometimes generating better code requires runtime profile information. But being a static compiler, not a JIT, the compiler could use such hints from the programmer. […] This adds a pragma 'inline' […] which influences the inlining of the function it appears in. [… N]o argument means the default behavior, as indicated in the command line. ". There isn't even a mention of symbol visibility/emission in that document.

Similarly, the spec states "The default inline behavior is typically selectable with a compiler switch such as -inline."

It's pretty clear what the intention here is – not in the least because "inlining" has a well-defined meaning across compiler people, and control of symbol emission is not it.

This is not to say that having control about the latter is a useless thing to have. I'm just not sure why/what for you are arguing here. C++ (resp. its compilers) has both inline and __forceinline/…. The former has a famously bad name. The latter is what is equivalent to pragma(inline, true) in D. Certainly, the latter should be made to work across modules, and making the symbols inline-only/non-exported will help with binary size, link times, and so on – you will find various posts of me saying that here. But that

This inline stuff isn't about performance; it's about link errors.

just isn't true for pragma(inline, true) in D.

dnadlinger commented 5 years ago

How can it not be conservative and emit the symbols where they're needed, just in case? Does it have knowledge of the contents of libPhobos?

While doing the necessary amount of semantic analysis of Phobos to use it in your code, the compiler will have a list of symbols it knows to have been emitted already when compiling the library. For example, if you had a function Container!int bar(); in some library module std.foo, the symbols for Container!int would have already been emitted into foo.o. Thus, when you import std.foo; from your code and also happen to use Container!int, the compiler doesn't need to emit the symbols again. Of course, if Container!int was used in some function bodies inside std.foo that are not analysed when importing the module, the code would still be emitted twice.

That's the theory, anyway – whether this is actually worth the effort or not is a separate question. It does save quite a bit of time in real-world settings, though, especially with separate compilation of typical template-heavy (or rather, non-template-averse) application code.

None of this is specific to Phobos, by the way, or even to libraries. The concept exists strictly on the module level, and builds on D's requirement that all imported modules must be linked in – whether that's in the form of object files or static/dynamic libraries isn't important.

TurkeyMan commented 5 years ago

Really? [...]

I mean, sure. In that particular context it's obvious that my only concern was perf, and no other matters influenced that particular discussion.

have a look at the original DIP

Hmm, I don't recall ever seeing a DIP about inlining at the time. I recall it just appearing.

C++ (resp. its compilers) has both inline and __forceinline

In terms of the semantics I'm discussing here, they're identical though. The only difference is the latter is a stronger nudge to the optimiser. Infact, I think the semantics comparing inline and __forceinline are: inline == the thing I'm talking about here. __forceinline == the thing I'm talking about here, plus a strong nudge to the optimiser to insist it be inline if possible.

But __forceinline is not without the actual inline semantics.

The latter is what is equivalent to pragma(inline, true) in D

No, not really. Maybe pragma(inline, true) includes the strong nudge part of __forceinline, but it still misses the actual inline part, which is kinda fundamental.

Either way, the inline semantic I'm talking about here is an important feature of any inline request. I guess I never imagined a world where inline didn't have these properties, so I'm surprised.

just isn't true for pragma(inline, true) in D.

Okay, so I guess I need to challenge that definition then. It's missing a substantial detail.

TurkeyMan commented 5 years ago

That's the theory, anyway – whether this is actually worth the effort or not is a separate question. It does save quite a bit of time in real-world settings, though, especially with separate compilation of typical template-heavy (or rather, non-template-averse) application code.

Okay, that's interesting for templates... but for inline, it's work that shouldn't happen. Inline functions are not templates, and don't involve heavy instantiation cost by definition. It will be parsed by each compilation unit either way, and not emitting the code (unless it's referenced) is logically less work, and should be faster.

TurkeyMan commented 5 years ago

and builds on D's requirement that all imported modules must be linked in

But, there are .di files...?

dnadlinger commented 5 years ago

But, there are .di files...?

.di is just a convention for the name of files which have been stripped of function bodies where possible. Those implementations still exist elsewhere. This is unrelated to symbol emission.

dnadlinger commented 5 years ago

It will be parsed by each compilation unit either way, and not emitting the code (unless it's referenced) is logically less work, and should be faster.

Agreed – and by the way, activating cross-module inlining for pragma (inline, true) in LDC will also fix the long-standing issue with druntime intrinsincs not being inlined (which defeats their point). We actually have had an implementation of cross-module inlining for a long while now, but the frontend is buggy enough that enabling it by default wasn't viable.

(We are talking about things like mangled symbol names not being stable between different compiler invocations here. The DMD inliner is so limited that many of these bugs just happen not to be exposed upstream. pragma (inline, true) would allow us to always enable it for the unproblematic, high-payoff cases.)

TurkeyMan commented 5 years ago

This is unrelated to symbol emission.

If there is an inline function in a .di file, the expectation is that .di files never produce a .o.

TurkeyMan commented 5 years ago

activating cross-module inlining for pragma (inline, true) in LDC will also fix the long-standing issue with druntime intrinsincs not being inlined

See, that exactly the sort of thing that I expect it to be working for! It should be a non-issue :)

dnadlinger commented 5 years ago

If there is an inline function in a .di file, the expectation is that .di files never produce a .o

Whether the file is called .di or .d doesn't make a difference for this. It is true that in practice, one can get away with not building files which contain only declarations (or pragma (inline, true) function bodies, once the implementation has been improved). You will need knowledge of compiler internals (i.e. a layer other than the abstract language spec) to know when this is possible or not, though (cf. __initZ symbols for structs, etc.).

dnadlinger commented 5 years ago

It should be a non-issue :)

Agreed on that front. Sadly, our previous attempt (before the pragma was a thing) was stifled by aforementioned frontend bugs.

As per the edited issue title, the only thing that actually needs to be improved beyond selectively enabling cross-module inlining again is to stop emitting those functions in compilation units where they are not used.

TurkeyMan commented 5 years ago

Okay, we're done here, and I'm very excited! Thanks!

kinke commented 5 years ago

one can get away with not building files which contain only declarations

That's dangerous, even a single import statement can break that down, e.g., https://github.com/ldc-developers/ldc/issues/2881#issuecomment-447657304. (edit: bad example, both files are separately compiled in a single cmdline)

dnadlinger commented 5 years ago

Okay, we're done here, and I'm very excited! Thanks!

Someone still needs to do the actual work. :P

kinke commented 1 year ago

IMO solved by #3650 - a function literal with pragma(inline, true) is inlined into every referencing object file without keeping the lambda as object-file symbol; doesn't even need -O for this guaranteed cross-module inlining. E.g., https://github.com/ldc-developers/ldc/issues/3214#issuecomment-939349404.