opensvc / multipath-tools

Other
59 stars 47 forks source link

libmpathutil: explicitly annotate {get,put}_multipath_config as weak #86

Closed 0n-s closed 2 weeks ago

0n-s commented 3 months ago

When linking libmpathutil with -Bsymbolic-non-weak-functions, multipathd segfaults because the weak nature of these symbols is not recognized.

Add the necessary attributes to make that clear to the linker.

Fixes: https://github.com/opensvc/multipath-tools/issues/26 # sort of Tested-by: ns 0n-s@users.noreply.github.com

mwilck commented 3 months ago

Why would you want to use -Bsymbolic-non-weak? We have chosen our linker flags with care.

0n-s commented 3 months ago

All functions being assumed to be weak by default comes at great cost in terms of performance, yet in 99% of instances it has no value. -Bsymbolic-non-weak-functions is a safe flag that fixes that, but isn't an indiscriminate big hammer that disallows weak functions from working properly like -Bsymbolic-functions.

More details by someone who knows much more than I do: https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic

mwilck commented 3 months ago

All functions being assumed to be weak by default

How do you infer that we're doing that? All we do is rely on the standard ELF symbol resolution mechanism, which does not assume that all symbols are weak.

comes at great cost in terms of performance

Please provide numbers showing how much speed-up you observe on a given plaform. To my understanding, PLT symbol lookups are mainly expensive when the function is first used (because this requires a call to the dynamic linker). Subsequent calls only add the overhead of one indirect jump (How to Write Shared Libraries, §1.5.5).

For multipathd, the startup-time overhead should be negligible, as it's a long-running process. Interactive calls to multipath or multipathd -k start up fast, they need no optimization IMO. multipath is called frequently from udev rules (as multipath -u $DEVICE or multipath -U $MAPNAME). These are the only invocations for which I think startup-time optimization might be beneficial. But again, I need numbers to get an idea about the possible performance gain.

By using symbol versioning, we already restrict the number of non-local symbols in libmultipath and libmpathutil to ~190 and ~80, respectively, which is not overwhelmingly much for modern CPUs.

I have read the blog article that you linked, but I'd be lying if I said I understand everything. There's a large zoo of compiler and linker options that interact with each other in hard-to-understand ways. The author of that article seems to be specialized in proposing even more such options, which make it even more difficult to understand what's going on.

Unless I see proof that this has measurable performance benefits in real-world scenarions, I'd rather stick with the easily understandable default symbol lookup, which does not require the weak attribute for these functions.

I also read Linus' post, which is much easier to understand, and I agree that he's got a point. There has been a paradigm shift during the last ~15 years with "disk" space getting larger, faster, and cheaper all the time. The arguments pro DSO are getting weaker, while those against DSO are getting stronger. Maybe some day we should reconsider just linking multipathd and multipath statically. The only library that's actually used by external code is libmpathpersist, and I suppose we could fix that somehow, too.

I'd prefer that to us supporting fancy hard-to-grok compiler and linker options.

0n-s commented 3 months ago

How do you infer that we're doing that? All we do is rely on the standard ELF symbol resolution mechanism, which does not assume that all symbols are weak.

From the blogpost: For a shared object, a default visibility STB_GLOBAL or STB_WEAK symbol can be preempted (interposed) because a preceding component may define a symbol of the same name.

Compiling with just -O2 here, no linker settings:

$ readelf -W --dyn-syms libmpathutil.so|grep -E 'put_multipath_config|get_multipath_config'
   103: 0000000000009e80     8 FUNC    GLOBAL DEFAULT   13 get_multipath_config@@LIBMPATHCOMMON_1.0.0
   104: 0000000000009ea0     6 FUNC    GLOBAL DEFAULT   13 put_multipath_config@@LIBMPATHCOMMON_1.0.0

As you can see, the symbol is GLOBAL, so those symbols are weak. This makes sense, because if it was any different, these functions would be broken without this patch, with the linker option or not.

For multipathd, the startup-time overhead should be negligible, as it's a long-running process. Interactive calls to multipath or multipathd -k start up fast, they need no optimization IMO. multipath is called frequently from udev rules (as multipath -u $DEVICE or multipath -U $MAPNAME). These are the only invocations for which I think startup-time optimization might be beneficial. But again, I need numbers to get an idea about the possible performance gain.

I'm not arguing for this option for multipath-tools specifically, I completely agree with your assessments here. I'm just trying to make a small contribution towards making entire systems buildable with -Bsymbolic-non-weak-functions in default LDFLAGS.

I have read the blog article that you linked, but I'd be lying if I said I understand everything. There's a large zoo of compiler and linker options that interact with each other in hard-to-understand ways. The author of that article seems to be specialized in proposing even more such options, which make it even more difficult to understand what's going on.

The -Bsymbolic & -Bsymbolic-functions sections can be ignored because they are not standards-compliant. The visibility sections can be ignored because they are also not what I'm proposing here. TL;DR:

I hope that makes the parts relevant to this patch clear.

Unless I see proof that this has measurable performance benefits in real-world scenarions, I'd rather stick with the easily understandable default symbol lookup, which does not require the weak attribute for these functions.

I'd argue this is easy enough to understand too: functions are by default direct bound. Functions marked as weak are treated as weak. In the default you describe, every function is weak unless it has non-default visibility (not really an option for DSOs, CMIIW). But you're the boss either way.

I'd prefer that to us supporting fancy hard-to-grok compiler and linker options.

Sure, if there is no DSO there is no problem. I don't exactly know a lot about the current issues of your project, so I wanted to submit a patch that doesn't really question the status quo all that much. Completely fine with dropping this if that's the road you'd like to go down.

hard-to-grok

I agree the blogpost is daunting, but the manpage does a decent job: https://man.archlinux.org/man/extra/lld/ld.lld.1.en#Bsymbolic-non-weak-functions

mwilck commented 3 months ago

From the blogpost: For a shared object, a default visibility STB_GLOBAL or STB_WEAK symbol can be preempted (interposed) because a preceding component may define a symbol of the same name.

It sure can, this is intended, as you certainly understood.

As you can see, the symbol is GLOBAL, so those symbols are weak. This makes sense

This makes no sense to me. STB_GLOBAL != STB_WEAK.

I'm not arguing for this option for multipath-tools specifically, I completely agree with your assessments here. I'm just trying to make a small contribution towards making entire systems buildable with -Bsymbolic-non-weak-functions in default LDFLAGS.

What systems are you talking about? Are there any distributions that have this plan? Which?

Also, the blog article states:

We need a linker option to cancel default -Bsymbolic-non-weak-functions. I have added -Bno-symbolic to GNU ld and gold.

So distributions can actually use -Bsymbolic-non-weak-functions by default, and just use -Bno-symbolic for multipath-tools. Not saying that this would be ideal, but multipath-tools not supporting -Bsymbolic-non-weak-functions won't hinder any distro from making it the default.

The -Bsymbolic & -Bsymbolic-functions sections can be ignored because they are not standards-compliant.

Which "standard" are you talking about? The only standard that I'm aware of is the ELF standard, which our code complies to.

  • ELF default allows all symbols to be preemptible (i.e. weak, if my understanding is correct)

preemptible != weak. The difference is explained in the ELF standard, §III.1, section "Symbol Table". For dynamic linking, it only affects the linker's behavior for undefined symbols. For static linking, it has additional effects.

Btw, have you verified whether with your patch, but without -Bsymbolic-non-weak-functions, our code still works correctly?

  • IOW, every exported function can be replaced (e.g. by way of LD_PRELOAD)

  • This has a pretty great cost, especially for small programs that are launched all the time or programs linking to a lot of DSOs like compilers

Fair enough, but "pretty great cost" is vague. I described previously which would be the only cases worth optimizing in this manner. I'd still like to see numbers for our code. Speed improvement for kernel compilation doesn't matter here.

  • The article proposes building with -Bsymbolic-non-weak-functions as a distribution default,

Again, which distributions are planning to do this?

which would mean functions not explicitly marked weak will not be replaceable

AFAICS, this is not standard compliant, either, at least not if "standard" refers to the ELF standard. -Bsymbolic-non-weak-functions changes the semantics of STB_WEAK from what the standard describes.

Unless I see proof that this has measurable performance benefits in real-world scenarions, I'd rather stick with the easily understandable default symbol lookup, which does not require the weak attribute for these functions.

I'd argue this is easy enough to understand too: functions are by default direct bound. [...]

I do understand what the option does, and why it provides a (theoretical, as far we're concerned) performance benefit. But there are tons of other options with vaguely similar effects. I'm pretty sure that multipath-tools can't support arbitrary combinations of these.

We currently have a similar problem: our code doesn't work with LTO (#18). Unlike -Bsymbolic-non-weak-functions, several distributions default to using LTO by default today, and we need to disable the related options for multipath, which I'd like to fix some time.

I agree the blogpost is daunting, but the manpage does a decent job: https://man.archlinux.org/man/extra/lld/ld.lld.1.en#Bsymbolic-non-weak-functions

Only supported by clang? Is there any plan to enable this in the GNU toolchain as well?

All this said, I will consult with some tool chain experts to find out what they think about re-introducing the "weak" attribute for these symbols; in particular, whether that could have undesirable consequences with other compiler or linker options. I'm also open to opinions of @cvaroqui, @bmarzins, and other regular contributors to this project.

If the campaign of the blog author is successful, and major distributions start using this option by default, we will certainly need to adapt to it in some way.

0n-s commented 3 months ago

This makes no sense to me. STB_GLOBAL != STB_WEAK.

In practice, yes. man 5 elf pretty clearly describes that STB_GLOBAL is equivalent to STB_WEAK, except that it is higher-priority. Maybe you know something that I'm missing here...

What systems are you talking about? Are there any distributions that have this plan? Which?

None currently (at least any that I know of), admittedly. This patch came out of an experiment to see how much of the proposed plan in the blogpost is possible with current code, with Buildroot (you can add additional linker flags there). If anyone wants to follow up on this, I propose they not do that since testsuites should absolutely be run for making sure everything works, & Buildroot just does not support that all that well (unsurprisingly).

Which "standard" are you talking about? The only standard that I'm aware of is the ELF standard, which our code complies to.

I did say you can ignore those sections (in the blogpost) because they're not relevant to what I'm proposing here, right in the section you quoted. Note how I mentioned -Bsymbolic & -Bsymbolic-functions there (which are not general purpose flags), & not -Bsymbolic-non-weak-functions, which is what I'm trying to do here. So responding to that bit is fairly irrelevant.

Fair enough, but "pretty great cost" is vague. I described previously which would be the only cases worth optimizing in this manner. I'd still like to see numbers for our code. Speed improvement for kernel compilation doesn't matter here.

Again, I'll agree on that point, it doesn't really matter all that much for multipath-tools. I only was trying out using -Bsymbolic-non-weak-functions as a system-wide default. If you think this patch is problematic in any way, -Bno-symbolic is probably the better way to go about it, as you mentioned.

AFAICS, this is not standard compliant, either, at least not if "standard" refers to the ELF standard.

This is true, yes (technically no, but yes in practice).

However, the argument for -Bsymbolic-non-weak-functions is basically that the standard imposes a tremendous amount of inefficiency to all DSO exported functions by default, & so it is justified to violate the defaults of the standard, making preemptibility opt-in rather than opt-out. Breaking standards should always be viewed with caution, but if it's a prominent toolchain maintainer saying it'd be a great idea to use these by default in a distro, I think that carries some weight.

Again, this is meant as a general statement for all code within a distro, not just multipath-tools. You are free to disagree with that statement; maybe you think all symbols being preemptible (what I would personally consider ludicrous) is a good default, or that it matters to strictly conform to the standard despite its inefficiencies.

-Bsymbolic-non-weak-functions changes the semantics of STB_WEAK from what the standard describes.

This is not true. It changes the semantics of STB_GLOBAL. The very name indicates that STB_WEAK is unaffected.

I do understand what the option does, and why it provides a (theoretical, as far we're concerned) performance benefit. But there are tons of other options with vaguely similar effects. I'm pretty sure that multipath-tools can't support arbitrary combinations of these.

I agree, however I do believe -Bsymbolic-non-weak-functions is a sane system-wide default (not saying it's necessarily good for multipath-tools, although I'll admit I don't see the issue with this patch). The vast majority of code does not care about being preemptible (if they do, they only care about very specific functions), & having that behavior be the default for everything, often unknowingly to all (besides those who know ELF's peculiar dynamic linking behavior & set it themselves), just creates a lot of inefficiency for no good reason. This is not some optimization that can only really ever be opt-in by programs themselves like -ffast-math. Again, maybe you disagree here, & that's OK. Like with e.g. -fstack-protector defaults, you can disable them if you genuinely believe that this isn't the right behavior for your program. But it is general-purpose. Again, I will fully admit I cannot make a case for multipath-tools specifically, since I probably wouldn't be able to tell if it was even compiled with -O0.

Only supported by clang? Is there any plan to enable this in the GNU toolchain as well?

lld, not Clang. You can use ld.lld with GCC, it's a fairly GNU-compatible linker. But more realistically, it is supported by mold, which is fairly popular among Gentoo users. You might consider the binutils patches a "plan" of sorts, however the ML postings of these were basically ignored. So no obvious objections, just a patch that nobody noticed.

If the campaign of the blog author is successful, and major distributions start using this option by default, we will certainly need to adapt to it in some way.

I hope so! :) Not happening without the binutils patches being looked at, but it'd be great. I've been compiling an entire distro with it enabled & it really does make a big difference in the startup time of programs small & large, from shells to KDE apps. It makes it quite obvious that the defaults for ELF dynamic linking are pretty horrendous for most programs, & how little effort you would need to solve that properly. Arguably this comment is irrelevant, but I can't say I'm not interested in the benefits this brings.

0n-s commented 3 months ago

Btw, have you verified whether with your patch, but without -Bsymbolic-non-weak-functions, our code still works correctly?

Forgot to respond to this earlier, but yes, it works. I just ran the testsuite & it's all green.

bmarzins commented 3 months ago

I'll be the first to admit that I'm not a linker wizard, but IIRC adding "attribute((weak))" won't have any effect on how the multipath tools are currently linked. I did a little bit of testing with it (using our standard build flags), and everything appears to work as expected. If it really is harmless to add, then I have no objections to adding it. But for this to work correctly with -Bsymbolic-non-weak-functions, wouldn't the definitions in libmultipath also need to be weak. Otherwise libmultipath will use its default definition, which wouldn't match with what multipathd uses. Does multipathd really work with -Bsymbolic-non-weak-functions set? Am I misunderstanding something?

The arguments pro DSO are getting weaker, while those against DSO are getting stronger. Maybe some day we should reconsider just linking multipathd and multipath statically. The only library that's actually used by external code is libmpathpersist, and I suppose we could fix that somehow, too.

libmpathvalid is also designed to be used by external code and links to libmulitpath. Actually to me, these libraries seem to be where statically linking libmultipath would be the most useful. I've never been a fan of how we expose so much of libmultipath to anyone who wants to use one of our libraries build on top of it. Even with libmultipath.version, we still need to expose a lot of poorly named functions that could easily get interposed. For instance, if some program that links to libmpathvalid or libmpathpersist has its own function named "init_config", things will not work correctly since the libmultipath function won't get used (I suppose this is another place where -Bsymbolic-non-weak-functions would fix things)

0n-s commented 3 months ago

wouldn't the definitions in libmultipath also need to be weak.

Thank you for the catch, fixed.

mwilck commented 3 months ago

I've played around with this, too, and it seems to behave as advertized (last version, with @bmarzins fix included).

IOW, re-introducing the "weak" attribute probably won't hurt us. We should wait some more though, at least until #85 is merged.

I'm still kind of sceptical about using -Wl,-Bsymbolic-non-weak-functions distribution-wise, though. multipath-tools may not be the only package that relies on the documented semantics of ELF symbol lookup.

mwilck commented 2 months ago

Actually to me, these libraries seem to be where statically linking libmultipath would be the most useful.

We can't easily do this though, as we'd again be in license compatibility troubles, unless we completely avoid linking with GPL 3.0 code, or other code that isn't fully compatible to GPL-v2.0-only, for that matter.

mwilck commented 4 weeks ago

I'll merge this patch in the forthcoming release. I shall add that this does not imply that we (the multipath-tools upstream developers) make any promises wrt support of -Bsymbolic-non-weak-functions. I still don't think that setting this flag generally in a distribution is a good idea. My distribution's toolchain expert thinks the same.

But both @bmarzins and myself have experimented with it and didn't find any regressions, so I'm applying it for now.