Open alexcrichton opened 9 years ago
So basically, with MSVC toolchain you are supposed to know whether you are going to be linking with a static lib or a dll version of the upstream library when compiling your crate.
Here's some ideas that come to my mind:
#[link_dll] extern crate foo
).dllimport
dllimport
:
_imp__foo = &foo
symbols into static libraries (linking succeeds, but adds the overhead of indirect access).dllimport
attribute to the symbols that need it, then re-emit the object file.I agree that strategies like #[link_dll]
probably won't work out so hot, whatever we do probably needs to be baked into the compiler as much as possible instead of requiring annotations.
I'm personally a bit up in the air about how to tackle this. I only know of one absolute failure mode today (#26591) and otherwise the drawbacks today are lots of linker warnings and some extra indirection (not so bad). In that sense it doesn't seem incredibly urgent to tackle this, but it's certainly a wart!
Here's some links with useful information: Using dllexport from static libs: https://support.microsoft.com/en-us/kb/141459 dllimport/dllexport details: https://msdn.microsoft.com/en-us/library/aa271769%28v=vs.60%29.aspx
Looks like you can use a module definition file to get the current strategy to at least work in all cases, although it wouldn't remove the unnecessary indirection.
In general it seems impossible to get best performance in all situations, unless you either have two sets of binaries for dynamic vs static linking (as microsoft does with the standard library), or always build from source, and use compiler flags to fine-tune its behaviour. Only cargo really exists at a high enough level to be able to do that automatically.
Is there any danger of actual runtime unsafety here, eg. the msdn article implies that getting this wrong can result in code which uses the address of the constant getting the address of the import table instead?
@Diggsey that MSDN blog post is actually different than #26591 I believe, handling dllexport to the best of my knowledge is done 100% correctly now that #27416 has landed. The bug you referenced, #26591, has to do with applying dllimport
to all imported statics, and if they're not actually imported via a DLL then the linker won't include the object file. It's kinda the same problem, but not precisely.
Is there any danger of actual runtime unsafety here, eg. the msdn article implies that getting this wrong can result in code which uses the address of the constant getting the address of the import table instead?
I'm unaware of any impending danger, but which MSDN article were you referencing here? I'm under the impression that the linker fixes up references to functions automatically, and it apparently also fixes up dllimported statics to not use dllimport if not necessary.
@alexcrichton When attempting to link to some statics in the CRT directly from Rust, the code would link fine but the generated code was wrong. It would would follow the pointer to the static, but instead of accessing the static it then treated the static as another pointer and would segfault since the static wasn't supposed to be a pointer. I'm not sure whether this was the fault of dllimport
or shoddy LLVM code generation or even rustc itself. We'll probably need some make tests covering all the possibilities.
The MS link's behavior to require an object file to be chosen for linking before starting to auto-insert __imp__
stubs for dllimport'
ed symbols is pretty strange. I could not find any mention of this on MSDN (which, hopefully, would explain the rationale), but I can confirm that it indeed works this way.
Manually inserting __imp__foo = &foo
symbol into the static library, as I proposed in option 3 above, fixed the case of data-only static lib, and seems to have no ill effects for dlls, so maybe that's the way to go?
Here's my test case:
Static library:
__declspec(dllexport) int foo = 1;
int* _imp__foo = &foo;
__declspec(dllexport) int bar() { return 2; }
void* _imp__bar = &bar;
Main executable:
__declspec(dllimport) int foo;
__declspec(dllimport) int bar();
int main()
{
int f = foo;
int b = bar();
printf("%d %d\n", f, b);
return 0;
}
If you comment out both _imp__foo
and _imp__bar
, you'll end up with an "unresolved external symbol" link error. Commenting out only one of them makes linking succeed with a warning.
@vadimcn That doesn't quite solve the issue if I need to link to statics from a library that I don't control, like a system library.
@retep998, Yeah, this only solves the problem for Rust crate writers.
For FFI we still might need some sort of #[dllimport]
/#[no_dllimport
attribute (depending on which is the default).
Actually, I wonder if we could we use the #[link(kind="...")]
attribute as a cue. At first sight, it tells the compiler exactly what it needs to know - whether the library is static or a dll.
@vadimcn, @retep998 to solve that issue (needed to bootstrap with LLVM) the compiler now has a #[linked_from]
attribute for extern blocks like so:
#[linked_from = "foo"]
extern {
// ...
}
This instructs the compiler that all the items in the block specified come from the native library foo
, so dllexport and dllimport can be applied accordingly. The way that foo
is linked in is determined by either -l
flags or #[link]
annotations elsewhere.
Note that #[linked_from]
is unstable right now as we'll probably want to think about the design a little more, but it should in theory provide the ability for the compiler to apply dllimport
to all native library imports appropriately
@alexcrichton: I am confused about the purpose of #[linked_from=...]
. How is it different from #[link(name=...)]
?
There's no connection between #[link]
, -l
, and a set of symbols. As a result we don't actually know where symbols in an extern
block came from (e.g. what native library they were linked from). The #[linked_from]
attribute serves to provide that connection
There's no connection between #[link], -l, and a set of symbols...
This seems a bit bizarre. #[link]
's can be placed only at extern {}
blocks, so it would seem that they are associated with each other. Why did we need a new attribute?
True, but they're frequently not attached to the functions in question. It's pretty common to have a #[link]
on an empty extern
block which is actually affecting something somewhere else.
Many of these could probably be fixed with the advent of #[cfg_attr]
, but it doesn't solve the Cargo problem where most native libraries come from a -l
flag to the compiler, where we definitely don't have a connection from an arbitrary -l
flag to an extern block and a set of symbols.
Here's my attempt at fixing the problem with data dllimports: vadimcn/rust@d5d7ac52dad980c041ab20b32c68871de0728402
TODO: do this for windows targets only
Should I also check that the symbol is in reachable
?
@vadimcn isn't that basically just applying dllexport
to all items? How would ensuring that __imp_foo
exists help with dllimport?
@alexcrichton I think the idea is that if you dllimport
a static but the static is statically linked, and not coming from a DLL, the linker will look for __imp_foo
and find it so that it works. Since making __imp_foo
exist would only help when dynamically linking rust libraries, shouldn't we already have all the metadata when we link a rust library to determine whether dllimport
is needed thus making that __imp_foo
thing unnecessary? Really the big issue is telling Rust whether a symbol from a native library needs dllimport
or not. When you link a native library on Windows it could have static statics or it could have dynamic statics, and there's no way for Rust to know which it is except through user added annotations.
Hm yeah I can see how that would solve our linkage problems (#26591), but I wouldn't consider it as closing this issue. There'd still be an extra level of indirection in many cases which we otherwise shouldn't have.
We also unfortunately don't have the information to determine whether to apply dllimport
currently. That attribute can only be applied at code-generation time, and when we're generating an object file we don't actually know how it's going to end up getting linked. For example an rlib may be later used to link against an upstream dylib or an upstream rlib. This basically means that at code generation time we don't know the right set of attributes to emit.
For native libraries it'll certainly require user annotations, but that's what I was hoping to possibly stabilize the #[linked_from]
attribute with at some point.
@alexcrichton: I think dllexport
only works when creating an actual dll. The __imp__
stubs go into the import library, which does not exist in the case of a Rust static library (.rlib).
As you mention above, the determination of whether to apply dllimport
must be made at code generation time. So if we want Rust crate linking to Just Work, we are going to have to accept some overhead (unless we use LTO bitcode to do some just-before-linking code generation, as I proposed in option 4. But you didn't seem to like it too much).
For data, there isn't much choice, since marking dllimport
is the only case that works for both static and dynamic linking. Fortunately, public data is not common in Rust crates.
For code, we can choose between:
dlimport
, and having an extra jmp
when linking to a dll, or,dlimport
, and suffering from indirect calls when linking statically (actually, the linker is supposed to be smart enough to re-write these as direct calls + some nop
padding, but I haven't seen MS linker actually do that).For native libs, I think we should be able to use information from #[link(kind="...")]
?
Ah interesting! So the foo.dll
doesn't actually have __imp_foo
symbols, just the code in foo.lib
? That... would make sense!
I've toyed around with a few ideas to handle our dllimport
problem, and we could in theory just start imposing more restrictions on consumers of rust libraries to solve this. Whenever an object file is generated the compiler would need to make a decision about whether it's linking statically or dynamically to upstream dependencies. The compiler knows what formats are available, and the only ambiguous case is when both are available. Once a decision is made, the decision is encoded into the metadata to ensure that future linkage against the upstream library always remains the same.
Thinking this through though in the past I've convinced myself that we'll run into snags. I can't quite recall them at this time, however. In theory though this would enable us to actually properly apply dllimport
in all cases.
Also yeah, for native libraries we always precisely know how they're being linked (statically, dynamically, framework, etc), so this isn't a problem in their case. We just need to know what symbols come from what library and that's what #[linked_from]
is serving as.
Work on MSVC toolchain integration is ongoing, with full support (on 64-bit) shipping in the 1.4 beta today.
If the MSVC version is going to be stable/official in 1.4, this issue should probably be nominated!
Why can't we just work around the static problem?
We can, as I've shown above. It's just gonna cost us some performance in the case of static linking.
For me the issue is with system libraries where there simply is no work around at the moment except to not use that library at all.
We can, as I've shown above. It's just gonna cost us some performance in the case of static linking.
A performance trade-off that hurts perf in the case of static linking in favor of improving things for dynamic linking seems like a terrible trade-off to me.
Why not "just" defer all code generation for these cases to link time, so that you know whether the thing is in a DLL or a static library when you generate the code? i.e. make opt builds LTO-only. For non-opt builds, you could do whatever performance-hurting thing is convenient.
@retep998
Won't a extern "C-dllimport"
handle the native function case? Anyway, you can extern
the __imp
of a native dllimport function as a static function pointer.
extern {
#[link_name="__imp__foo"]
static foo: extern "C" fn() -> u32;
}
@vadimcn
The default on Linux is the equivalent of never using dllimport
. If we can take the overhead there, we can take it on Windows. A #[dllimport]
tag would be nice for when it is measured to be important, however.
@arielb1 For functions it doesn't really matter. Whether we specify dllimport
or not, functions will always work, albeit with a slight penalty if we get it wrong.
It is statics which are the real problem. Because if you forget to specify dllimport
it just doesn't work, and if you unnecessarily specify dllimport
then if the library only contains statics it also won't work. For statics there are two uses cases at the moment:
dllimport
indiscriminately, it breaks. Yes, I ran into this situation several times myself, it is entirely possible to run into even if your library never exposes any statics as part of its API.uuid.lib
. It is full of statics, and no functions. If I try to link to any of them, Rust applies dllimport
indiscriminately and it breaks.Please don't leave it as it is and apply dllimport
everywhere (or apply it nowhere), and then leaving it to the linker to fix things up (and forcing it to do so by e.g. exporting a stub function from each dependency and importing it).
Would it be possible to
dllimport
by default, make it an attribute; andcargo
figure out what dependencies will ultimately be linked dynamically and then tell rustc
this e.g.--dynamicextern dependencyname=C:\path\to\dep
which would mean that
cargo
should figure it out for them?It should be OK to add in a check where dllimport
is currently applied.
This is different from the commit a0efd3a3d99a98e3399a4f07abe6a67cf0660335 that was reverted in that this depends on cargo
, or the thing invoking rustc
, to tell it what each extern
will finally be linked as (or to be specific, tell rustc
if any crates are going to be dynamically linked) — instead of depending on some heuristics to guess.
Sorry for the long response in advance, but this is a tricky topic! The tl;dr; is that I have some ideas of how to solve this, but they haven't been fully fleshed out yet. I don't think the fix for this will be easy.
@retep998
For me the issue is with system libraries where there simply is no work around at the moment
From what @vadimcn has done, shouldn't it be possible to reference the __imp
symbols to be referenced manually? You'd have to manually encode the extra layer
of indirection, but it should be possible, right?
Also, out of curiosity, does the #[linked_from]
attribute work for you? We
could look into stabilizing it this cycle if it looks like it's worth it
@briansmith
Why not "just" defer all code generation for these cases to link time
Unfortunately this doesn't fit too well into the model the compiler has today, and while possible I'd prefer to not bend too far backwards just to fix small corner cases like this.
@arielb1
Won't a
extern "C-dllimport"
handle the native function case?
Unfortunately this suffers a similar problem as mentioned above, as the author of a library you don't always know where the symbols are coming from. This decision is often made by whomever is building the library, in which case encoding this kind of ABI information into the source can be cumbersome and require duplicate definitions (one for with dllimport one for without).
Additionally, also as mentioned
above,
for external libraries the unstable #[linked_from]
attribute should suffice,
the only remaining problem is dealing with Rust crates.
The default on Linux is the equivalent of never using
dllimport
.
This is actually because dllimport
only has meaning on Windows. The linker and
dynamic linker work differently on Unix where the compiler never has to make a
decision about where a symbol is coming from, those two just magically make
everything work "as fast as possible" in all cases.
In that sense, we're not actually taking a hit on Unix today, and I agree with @briansmith that we shouldn't cater to the dynamic linking case of Rust crates because that basically never happens.
@angelsl
Please don't leave it as it is
I agree! That's what this issue is about :)
- not apply dllimport by default, make it an attribute; and
Making dllimport an attribute suffers from a number of ergonomic concerns (discussed elsewhere on this issue), so I'd prefer to avoid a manual opt-in for dllimport.
- have cargo figure out what dependencies will ultimately be linked dynamically and then tell rustc this e.g.
The compiler doesn't really need much more information than what it has today actually, so there's probably not much that needs to be done on the Cargo side of things.
Here's my thinking on solving this issue:
dllimport
on symbols imported from native
libraries is handled via #[linked_from]
dllimport
on symbols imported from Rust libraries
is the sticky point.One aspect of the compiler today is that it produces one object file which can then be "linked" into multiple output formats. This consequently means that the same object file is used to produce both an rlib and a dylib if that is requested. The compiler also understands that upstream Rust libraries can be available in one of two forms (dylib or rlib). For each output type of the compiler it must calculate what format of upstream library is being used to produce the output.
So, given this background, the compiler must now answer the question that for
any particular symbol it references from an upstream crate, does it apply
dllimport
or not? Calling the upstream crate A, here's my thinking of how to
answer this question:
dllimport
.dllimport
.At this point there's only one possible situation for the compiler, we're emitting an rlib and only an rlib. Generating a binary, dylib, or staticlib will force linkage one way or another, so we would have made our decision via one of the metrics above first.
I think it's a safe to say that the compiler can assume that the set of output artifacts will not change over time for the current compilation, so now we're faced with a refinement of our original question: for a compilation that is generating only an rlib, should symbols referencing A be tagged with dllimport or not?
dllimport
is applied.dllimport
is not applied. In
theory some subset of upstream dependencies (including A) could be later
assembled into a dylib and then that goes into the final link step (hence
we'd need dllimport
but we forgot it), but I think it's safe to say this
doesn't happen.-C prefer-dynamic
, emitting dllimport
if it's passed or
not if it's not.Alright, so once we've gotten this far we've answered the question of whether to
apply dllimport
or not. This information also is then encoded into the
metadata to prevent link errors from later happening, specifically if an rlib is
generated to link statically against an upstream rlib, it must always be
linked statically against that rlib instead of having some intermediate dylib
assembled at some point. Note that this could generate some weird errors in some
situations, but this should generally be solved by compiling everything as rlibs
or everything as dylibs.
The problem with the thinking above is that it unfortunately doesn't work! The crucial part where it falls down is at the very end where the compiler has an upstream library available in two formats and no other constraints (e.g. generating an rlib) so it arbitrarily selects one output type. Consider a situation like this:
Given this setup, when you run cargo build
inside project A it will only build
B once, making some decisions about linkage. The executable A, however, should
be statically linked so B can't have dllimport
pointing at the standard
library. The plugin C, however, must link the standard library dynamically so
B must have dllimport
pointing at the standard library.
Basically, when building B, the compiler made a choice which was then later incorrect, but unfortunately the compiler can't make a choice here as whichever it chooses is incorrect.
I think that this may be one of the only failure scenarios, I at least haven't been able to come up with any others just yet. There are various options for solving this (e.g. just generate two object files and postprocess later), but in the interest of keeping the MSVC-specific logic to a minimum I'd prefer to not take routes like that. Unfortunately it may be inevitable, however.
Regardless, though, this is my thinking on this issue today! I've so far thought that this issue is relatively rare in practice (e.g. a lib with only statics doesn't seem that common) but I have a feeling it may start cropping up more often.
You know more about this than me, so I won't argue with you about the need to support non-LTO release mode. My point is really that if it only worked in debug mode and LTO mode, that would be OK with me, and probably a lot of other people who also don't use the non-LTO release mode.
The important thing to me is that in the case of static linking in LTO builds, there should be no overhead; i.e. Rust's zero-cost abstraction principle should apply here too. Obviously, for debug builds, extra overhead is not such a big deal, since that's the point of the debug builds.
The moment you want to mix static and dynamic linking you're in for a whole lot of trouble — not just on Windows/MSVC. ... You might get two different copies of singletons, for example. I also remember someone running into trouble with jemalloc because of this — the solution was to link std dynamically for both the plugin consumer and the plugin so we don't have multiple copies of std.
@briansmith I want to be able to use release mode without LTO due to LTO being annoyingly slow in Rust at the moment. Maybe if Rust got something like the incremental LTCG in recent versions of msvc, then it would be a possibility, but not at the moment.
@alexcrichton If #[linked_from]
can do what I need in stable Rust, then that's good enough for me. Just don't make it stable until I can confirm it actually works.
@alexcrichton
The problem with the thinking above is that it unfortunately doesn't work! The crucial part where it falls down is at the very end where the compiler has an upstream library available in two formats and no other constraints (e.g. generating an rlib) so it arbitrarily selects one output type. Consider a situation like this:
- Executable A depends on B and plugin C
- Crate C also depends on B.
Given this setup, when you run
cargo build
inside project A it will only build B once, making some decisions about linkage. The executable A, however, should be statically linked so B can't havedllimport
pointing at the standard library. The plugin C, however, must link the standard library dynamically so B must havedllimport
pointing at the standard library.Basically, when building B, the compiler made a choice which was then later incorrect, but unfortunately the compiler can't make a choice here as whichever it chooses is incorrect.
About plugins.
The ideal scenario would be to have A, B and C all link dynamically — the moment you have dynamic plugins, trying to mix statically linked libraries in will just give you hell, again, with multiple copies of libraries (including static fields, and especially singletons).
I remember someone running into a segfault when they had a dylib that had std
statically linked that is loaded by a main executable that also has std
statically linked. One of them passed a string over to the other and then it was dropped, and since the two have their own allocators of course the free
segfaulted.
There are a few choices if dynamically linking std
for everything isn't an option.
std
statically to the main executable and make all plugins link to that. But then we'd have to stuff the whole std
into the main executable because we don't know what from std
plugins might be using. And all the plugins' dependencies would have to be made to link this way too. Bad.std
statically to the main executable and have plugins link std
dynamically (this is what you were describing). But then we have two copies of std
again. Bad. You don't run into free
issues on Windows/MSVC though because MSVC uses the system allocator.So really the only way to safely have dynamically loaded plugins is to link everything, well, dynamically.
But OK, to answer the problem, if we really have the scenario you describe, based on what you said above:
- Executable A depends on B (statically linked to
std
) and plugin C- Crate C also depends on B (dynamically linked to
std
).
Cargo should treat the static and dynamic Bs as separate and compile them twice. I don't think there is really any other option.
(Ideally, the compiler should also barf when it is told to link A to a dynamic B and vice versa, but of course it won't know that the compiled B rlib is dynamically linked to std
.)
But again I really don't like the idea of linking std
to A statically and then to C dynamically, or even having two copies of B linked to both A and C. Bad bad bad bad bad.
- not apply dllimport by default, make it an attribute; and
Making dllimport an attribute suffers from a number of ergonomic concerns (discussed elsewhere on this issue), so I'd prefer to avoid a manual opt-in for dllimport.
A dllimport
attribute is essentially the same as the linked_from
thing we have now, just more explicit.
- have cargo figure out what dependencies will ultimately be linked dynamically and then tell rustc this e.g.
Combining dllimport
with this will make ergonomics a non-issue, because crates that need to be dllimport
ed will be. Things like Win32 will have to be annotated anyway.
But my solution has the same problem anyway, that if different dependencies depend on a common dependency in different ways (one dynamic, one static) then they need to be compiled twice anyway.
Although to be honest, I don't see why we should have, in the same crate, some things linked dynamically and some statically.
If the dynamically-linked things were real plugins, they should be in a separate crate. If they are not real plugins (as in you cannot drop in a different version, etc) then why dynamically link them?
The compiler doesn't really need much more information than what it has today actually, so there's probably not much that needs to be done on the Cargo side of things.
Fair enough.
@alexcrichton (and others): are you sure this is something worth optimizing?
If indirect call/variable access is in a tight loop, the compiler will hoist address load out of the loop. And if it isn't in a loop...
I suggest we implement the option that always works (i.e. inject _imp__<symbol>
stubs into static libraries) and call it a day. Ok, maybe we can also open an I-wishlist
issue to implement 100% correct dllimport
placement logic during LTO builds.
Oh, forgot about linking to extern libs. Yes, that's something we should fix.
@vadimcn
If we're going to support MSVC, might as well do it right.
@alexcrichton
This is actually because dllimport only has meaning on Windows. The linker and dynamic linker work differently on Unix where the compiler never has to make a decision about where a symbol is coming from, those two just magically make everything work "as fast as possible" in all cases.
That's not entirely accurate. dllimport is a compiler attribute. Accesses to non-dllimport constants are direct accesses, while calls to dllimport constants are indirect accesses, via the equivalent of a GOT. Calls to non-dllimport functions are direct calls, which are resolved either to the real function (if it is in the same DLL) or the equivalent of a PLT stub, and calls to dllimport functions are indirect calls.
Cross-library constants are required to be dllimport, because you can't have a PLT stub for a constant. In Linux, all constants are "dllimport" (accessed via GOT) because of LD_PRELOAD
.
Anyway, I think inserting a dummy function reference to make the linker create the "GOT", or even creating it ourselves, would be the right fix.
@arielb1
Anyway, I think inserting a dummy function reference to make the linker create the "GOT", or even creating it ourselves, would be the right fix.
This solution tries to ignore how MSVC works and relies on LINK
to fix up our incorrect imports (which it doesn't at all have to do!).
Please don't work around the problem like this, we really should just decorate the correct imports correctly.
Windows itself works similarly to Linux; they both have some sort of import table in the executable header which is linked at runtime by some dynamic linker (the kernel in Windows, and some library defined in the ELF header in Linux).
It's just MSVC that insists on linking through an import library and __imp_
and all .. :sob:
@angelsl
In Linux, all statics are linked "by dllimport". Linux linkers (that's binutils' ld
, not libc's ld-linux
) always automatically generate the local equivalent of __imp_
- there are no import libraries at all!
Here's a summary of the current problems with linking uuid.lib
in uuid-sys
. It is a library full of static statics, aka there is no associated DLL. If dllimport
is specified for those statics then link.exe
fails to resolve the symbols and it breaks.
One proposed solution is to use #[linked_from = "uuid"]
and specify static=uuid
when linking it, but lo and behold, apparently specifying the kind to static causes rustc to attempt to shove the .lib
into the .rlib
which is 1. completely wrong and not what I want and 2. fails because I apparently didn't tell rustc
where uuid.lib
is even though it is rustc
's job to find where uuid.lib
is when it invokes the linker due to that library being provided by the Windows SDK. Basically static=foo
and dylib=foo
are being used to control whether to bundle foo.lib
or just pass the name along to the linker at link time, and is thus entirely orthogonal and unrelated to this discussion.
The only thing rustc
needs to know about a symbol is whether to apply dllimport
to that symbol or not, and a list of .lib
files which it should pass on to the linker (link.exe
). The linker in turn figures out which symbol goes to which .lib
file, Rust does not need to know about that.
So what I need is an attribute to specify whether a symbol is dllimport
or not. That's it. That will solve the native library problem entirely. There is no way Rust can figure it out automatically. I know whether to emit dllimport
so let me pass that information on to Rust so Rust can pass it on to the linker!
Okay, did some digging and investigating and found this.
We don't apply dllimport to extern statics. What we actually do is apply dllimport
to statics when we define them in one Rust library and use them from another Rust library. So by virtue of other crates referencing statics defined in uuid-sys
, so going across a boundary from one Rust library to another, Rust emits dllimport
. If I define and use an extern static all within the same crate, then no Rust library boundary is crossed, and no dllimport
is applied. This is very weird behavior and is probably a bug.
@arielb1
I don't believe Linux goes through the GOT if you statically link things together. That would be quite terrible. (And worse if it's Rust specifically that does that!)
My issue with emitting a dummy function call so the linker fixes it all up for us is that
dllimport
everything even when we are linking everything dynamically. But from what you've been saying it seems like Rust does the equivalent of that on Linux (I hope not. I'll take a look myself.)..But even if Linux does it, doesn't mean we should do it on Windows because again, that's not how it's supposed to be done. If Rust wants to support MSVC, it should support MSVC and not try to work around how it works
@angelsl
Ah I think I wasn't clear enough in my scenario, the compiler certainly has quite a bit of logic (and I completely agree) that the same library should never appear twice in a process (e.g. statically in two separate modules). The "plugin" I mentioned was a compiler plugin, not a plugin to the executable being built. For example C and B needed to dynamically link to std because they're going to be dynamically opened up by the compiler, but A needs to statically link to std because that's how executables are linked by default. This means that B needs to be both statically and dynamically linked to std with the same object file, which is obviously problematic.
I'm not sure if the best solution is to compile B twice. It's certainly a solution but it would involve pretty invasive changes to Cargo and has a high risk of not being backwards compatible in one way or another. If we could find a solution that penalized dynamic linking perhaps but not static linking, I'd be perfectly fine with that because plugins aren't super-duper-stable today anyway.
@vadimcn
are you sure this is something worth optimizing?
An excellent question! One to which I do not have a concrete answer :). On one side of things I agree with @briansmith and @angelsl that we should really be doing the "right thing" here rather than working around issues. On the other hand, however, if this is a real pain point then it's certainly the fastest thing to accept your patch for now. On both hands, however, I do think it'd be certainly nice to get data on what kind of numbers we're talking about here.
I think the way to fix the extern libs issue is to basically start leveraging #[linked_from]
and then stop blanket applying dllimport
everywhere as a result (only do it on a targeted basis for #[linked_from]
).
@retep998
So what I need is an attribute to specify whether a symbol is dllimport or not. That's it. That will solve the native library problem entirely. There is no way Rust can figure it out automatically.
Unfortunately while this may work for your particular situation it doesn't solve the problem in general. Libraries mostly do not know where their native symbols are coming from (e.g. from a dylib or statically) because build scripts can be overridden and swapped out. For example Cargo depends on libssh2 but if it were packaged in a Windows package manager it'd want to link dynamically to libssh2 instead of statically, meaning there's no way for libssh2 itself to encode whether dllimport is applied or not.
In general I agree that it's nice to have an escape hatch, but the goal here is to add as few attributes as possible while solving as many use cases as we can find. Your use case with uuid.lib
will be fixed once the compiler starts not blanket-applying dllimport
to all statics. The only reason the compiler should ever apply dllimport
is:
#[linked_from]
connecting it to a dylib.@alexcrichton
Wouldn't that still end up having two copies of std? Any static fields in std are going to have two copies, one in the executable and one in the dynamically loaded copy..
I'll go and compile some DLLs with and without dllimport
using MSVC C, MinGW C and GCC C on Linux and compare it to MSVC, MinGW Rust and Rust on Linux, and see what the overhead really is (I guess it's just one extra jump?)
It would be great if we could have a solution that handled it all properly, but if the solution we pick penalises dynamic linking only for the case where we have some crate as a dependency of both a statically-linked crate and dynamically-linked crate, then it shouldn't be too bad.
But honestly it would suck if we do that and it lives on forever. Whatever fix we apply now will affect people who use it in the future, and it would be really bad if this ends up being one of those quirks that people are confused about when they encounter it, then they go and ask on like Stack Overflow, and the answer refers to this very issue on GitHub and tells the asker to "don't do this".
That's why I think barring having to totally rewrite parts of Cargo/rustc, it would be worth it to fix it properly. Because if we use a hackfix now, it's going to stay.
Something interesting. Not sure if you all already knew this, but I'll just put it here anyway.
MSVC does not complain if I compile a C file that links to something from a DLL without dllimport
. LINK
only complains if I compile a C file that imports something from itself.
Compile with cl /Ox /c dll.c
. Link with link /dll dll.obj
.
__declspec(dllexport) char* externstring() {
return "Hello!";
}
Compile with cl /Ox /c exe.c
. Link with link dll.lib exe.obj
.
#include <stdio.h>
/* __declspec(dllimport) */ char* externstring();
int main() {
puts(externstring());
return 0;
}
C:\Users\angelsl\Root\Development\rust\test>cl /Ox /c exe.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23026 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
exe.c
C:\Users\angelsl\Root\Development\rust\test>cl /Ox /c dll.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23026 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
dll.c
C:\Users\angelsl\Root\Development\rust\test>link /dll dll.obj
Microsoft (R) Incremental Linker Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
Creating library dll.lib and object dll.exp
dll.lib
has both externstring
and __imp_externstring
.
C:\Users\angelsl\Root\Development\rust\test>dumpbin /all dll.lib
Microsoft (R) COFF/PE Dumper Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file dll.lib
File Type: LIBRARY
Archive member name at 8: /
5600D352 time/date Tue Sep 22 12:04:34 2015
uid
gid
0 mode
7E size
correct header end
5 public symbols
186 __IMPORT_DESCRIPTOR_dll
3A0 __NULL_IMPORT_DESCRIPTOR
4D2 dll_NULL_THUNK_DATA
624 __imp_externstring
624 externstring
<snip>
Linking works fine even though dllimport
was not specified.
C:\Users\angelsl\Root\Development\rust\test>link exe.obj dll.lib
Microsoft (R) Incremental Linker Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
C:\Users\angelsl\Root\Development\rust\test>exe
Hello!
exe.obj
references only externstring
.
C:\Users\angelsl\Root\Development\rust\test>dumpbin /all exe.obj
Microsoft (R) COFF/PE Dumper Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file exe.obj
<snip>
RELOCATIONS #3
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000005 REL32 00000000 9 externstring
0000000D REL32 00000000 8 puts
<snip>
If I declare the import as dllimport
, then exe.obj
will reference __imp_externstring
. (Uncomment the commented block in exe.c
).
C:\Users\angelsl\Root\Development\rust\test>cl /Ox /c exe.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23026 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
exe.c
C:\Users\angelsl\Root\Development\rust\test>dumpbin /all exe.obj
Microsoft (R) COFF/PE Dumper Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file exe.obj
File Type: COFF OBJECT
<snip>
RELOCATIONS #3
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000006 REL32 00000000 9 __imp_externstring
0000000E REL32 00000000 8 puts
<snip>
Linking works the same as above — linking to dll.lib
. No new output.
Now instead of linking to dll.lib
for a dynamic link, I link to dll.obj
, which would be a static link. We get the familiar warning.
C:\Users\angelsl\Root\Development\rust\test>link exe.obj dll.obj
Microsoft (R) Incremental Linker Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
Creating library exe.lib and object exe.exp
exe.obj : warning LNK4217: locally defined symbol externstring imported in function main
I'll do a disassembly of the executables produced in a bit, to see what kind of indirection is produced by not having dllimport
and by dllimport
ing a local symbol.
On one side of things I agree with @briansmith and @angelsl that we should really be doing the "right thing" here rather than working around issues.
That's not really what I meant. What are the negative side effects of the proposed patch? I think it is good to make incremental progress. My code depend a lot on inter-crate references to statics, and I've not been affected by the issues motivating a change here. I doubt my usage is atypical. So, it seems unreasnable to pay a perf or size penalty for a workaround. But, if there's no perf/size penalty then I think a temporary workaround is OK.
@alexcrichton: I am not sure what is the right thing to measure here. But I did a bit of micro-benchmarking, and found that in a tight loop:
(If address load is hoisted out of the loop, the difference between direct and indirect loads/calls vanishes.) Mind you, these are comparisons of just those specific instruction sequences with loop overhead subtracted.
I did some experiments and disassemblies. Here.
TL;DR:
dllimport
wrong, whether is it extraneous or missing, we get one extra layer of indirection..obj
files..lib
files unless the .lib
file has been referenced due to something else, in which case .obj
files from it will be loaded and the imports fixed up.dllimport
. But MinGW-W64 does not fix up anything if we get it wrong i.e. we try to import __imp_X
when only X
exists, whereas MSVC does so.(There was some stuff here, but I moved it to a gist instead.
Sorry for spamming your emails.)
Currently the compiler makes basically no attempt to correctly use
dllimport
. As a bit of a refresher, the Windows linker requires that if you're importing symbols from a DLL that they're tagged withdllimport
. This helps wire things up correctly at runtime and link-time. To help us out, though, the linker will patch up a few cases wheredllimport
is missing where it would otherwise be required. If a function in another DLL is linked to withoutdllimport
then the linker will inject a local shim which adds a bit of indirection and runtime overhead but allows the crate to link correctly. For importing constants from other DLLs, however, MSVC linker requires that dllimport is annotated correctly. MinGW linkers can sometimes workaround it (see this commit description.If we're targeting windows, then the compiler currently puts
dllimport
on all imported constants from external crates, regardless of whether it's actually being imported from another crate. We rely on the linker fixing up all imports of functions. This ends up meaning that some crates don't link correctly, however (see this comment: https://github.com/rust-lang/rust/issues/26591#issuecomment-123513631).We should fix the compiler's handling of dllimport in a few ways:
dllimport
where appropriatedllimport
where appropriatedllimport
if they're not actually being imported from a DLL.I currently have a few thoughts running around in my head for fixing this, but nothing seems plausible enough to push on.
EDIT: Updated as @mati865 requested here.