Open Centril opened 5 years ago
yea... just make sure that anything that uses destructure_const
needs a feature gate, I'm not too sure it behaves soundly for const generics in all cases
@eddyb, I'm not sure we are talking about the same thing here. I'm proposing to make rustc
emit <type> <const-data>
(with <const-data>
being a hash of the constants value) just as a temporary solution until we have a proper grammar for ADT constants. Are you proposing to get rid of the <const> = <type> <const-data>
production entirely?
Are you proposing to get rid of the
<const> = <type> <const-data>
production entirely?
That production is a lie more general than what is implemented (which is only integer types, bool
and char
for the <type>
part).
Right now demanglers will treat any other type in that position as an error, so using it for that purpose requires changing demanglers, just like adding a special form for the opaque hash-only leaves, but leaving less encoding space usable by any future ADT mangling.
@eddyb: OK, I'm all for solving this properly now -- iff we can get it done in a reasonable amount of time 😃
It sounds like you and @oli-obk have come up with an exhaustive list of things the grammar needs to support, right? But it also sounds like the implementation on the compiler side is somewhat complicated by the fact that destructure_const
isn't quite reliable yet?
But it also sounds like the implementation on the compiler side is somewhat complicated by the fact that destructure_const isn't quite reliable yet?
it should work just fine for all normal aggregates like arrays, tuples, enums and structs, but there are certainly types it cannot handle or will handle weirdly. I'm fairly certain that all currently legal const generic types will work just fine, so that should be ok. It's just that extending the list of legal types is not trivially ok.
@eddyb: OK, I'm all for solving this properly now -- iff we can get it done in a reasonable amount of time
Initial mangling implementation (and grammar) up at #87194 - I spent more time convincing myself that I couldn't cut certain corners in the grammar, writing the comments for it, and refactoring the current handling of placeholders (which is in a separate commit), than adding the new support.
That's mostly because deref_const
and destructure_const
already exist, and have been around for a few months if not longer, so we could've had this done before #85530 was opened - a lot of my PR is just copy-paste from ty::print::pretty
, and adjusting the output to be the mangling we want (instead of user-facing).
Hopefully the extended constant mangling grammar doesn't end up being a bikeshed of its own.
Would it be reasonable, before changing the default, to stabilize the option to change the symbol mangling format? That would allow people to opt into the v0 format, and in particular would unblock the usage of it in tools such as instrumentation.
I'd be happy to submit a patch stabilizing the option.
Would it be reasonable, before changing the default, to stabilize the option to change the symbol mangling format? That would allow people to opt into the v0 format, and in particular would unblock the usage of it in tools such as instrumentation.
I'd be happy to submit a patch stabilizing the option.
That would result in us having to support the old mangling indefinitely, I think that differs from what was previously discussed.
We could start by making the default depend on nightly vs stable, so that it's only v0 on nightly where you can olt out of it with the unstable flag.
Unless, hmm, maybe you meant stabilizing the CLI flag but not (all) its values, i.e. -C symbol-mangling-version=legacy
would require -Z unstable-options
, but -C symbol-mangling-version=v0
wouldn't.
I think I could get behind that, guaranteeing only the versions that have gone through RFCs.
@eddyb Right, I was proposing stabilizing the option but not guaranteeing that it supports any particular value. We could choose to stabilize the v0
value now, and then consider stabilizing the legacy
value when we change the default (since there's no reason to pass it at all until the default changes).
@eddyb I submitted https://github.com/rust-lang/rust/pull/90128 to stabilize -C symbol-mangling-version=v0
.
An issue came up in https://github.com/rust-lang/rust/pull/89917#issuecomment-963755731 that is worth mentioning here: the compiler currently generates some v0 symbols that have a .llvm.<numbers>
suffix, which violate the v0 spec (it doesn't allow '.' chars), and some v0 demangler implementations (libiberty and Valgrind) fail to demangle these symbols.
Either the compiler should be fixed to not append these suffixes (which may be hard, because it's LLVM that's adding them) or the v0 spec should be modified to permit these suffixes, and the libiberty/Valgrind implementations should be updated accordingly.
Either the compiler should be fixed to not append these suffixes (which may be hard, because it's LLVM that's adding them) or the v0 spec should be modified to permit these suffixes, and the libiberty/Valgrind implementations should be updated accordingly.
I don't think either of them is correct - or at least rustc-demangle
doesn't do either, and does handle those pesky suffixes.
Does C++ Itanium mangling allow for the .llvm.
suffixes? AFAIK no, but you should get them if you use Clang with LTO.
What rustc-demangle
does, and what these tools should probably also do, is limit the symbol to just before the suffix, before attempting to demangle at all with any mangling scheme.
IOW, I believe this current behavior is a bug:
$ c++filt _ZN3foo3barE
foo::bar
$ c++filt _ZN3foo3barE.llvm.123
_ZN3foo3barE.llvm.123
(at least in a world where LLVM's LTO exists - one could argue that they screwed up by doing this)
EDIT: some precedent for similar issues with compiler passes suffixing symbols (tho the fix seems to be way more high-level than I would want): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40831
Does C++ Itanium mangling allow for the
.llvm.
suffixes? AFAIK no, but you should get them if you use Clang with LTO.
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-general
Mangled names containing $ or . are reserved for private implementation use. Names produced using such extensions are inherently non-portable and should be given internal linkage where possible.
It does kind of allow them.
Does C++ Itanium mangling allow for the
.llvm.
suffixes? AFAIK no, but you should get them if you use Clang with LTO.
Yes.
<mangled-name> ::= _Z <encoding> ::= _Z <encoding> . <vendor-specific suffix> <encoding> ::= <function name> <bare-function-type> ::= <data name> ::= <special-name>
A
<mangled-name>
containing a period represents a vendor-specific version or portion of the entity named by the<encoding>
prior to the first period. There is no restriction on the characters that may be used in the suffix following the period.
https://bugs.kde.org/show_bug.cgi?id=445916 has been filed for possibly updating Valgrind's v0 demangler to handle these suffixes, though it's still a bit unclear to me if that's the right thing to do.
though it's still a bit unclear to me if that's the right thing to do.
Well, if it helps:
With LLVM, names like foo.llvm.3285396211802591752
and names like foo.5
both arise when a symbol that originally had linkage local to a single object file, and thus only needed a locally unique name, goes through LTO and suddenly needs a globally unique name.
foo.llvm.3285396211802591752
come from ThinLTO when a local symbol ends up being referenced from a different object file, because some reference to it that was originally from the first object file was inlined into the second one. Since ThinLTO compiles each bitcode object file into its own native object file, such a symbol has to be changed to global linkage so that the native linker can resolve the reference; the suffix is preemptively attached to avoid clashes with any other symbols also named foo
(which could be either a local symbol from another object file that was transformed in the same way, or a symbol that was already global).
foo.5
come from full LTO. Full LTO combines everything into a single native object file, so symbols that were local in the input bitcode object files can stay local in the output native object file, but the space of "local" names now includes all symbols from all of the input files. Since full LTO is not incremental, the suffix is not attached preemptively, but only when there actually are two symbols with the same name from two different input files.
In both cases, the suffixed foo
has the same semantics as the original foo
, and the suffix is just to ensure a unique name. So it should be fine to ignore the suffix.
But there are also names like foo.cold.1
, which represents a partial chunk of foo
that was split off into its own function. In this case, foo.cold.1
does not have the same semantics as foo
. Ignoring the suffix is still fine if you are just trying to symbolicate a backtrace. But for more obscure use cases it may not be fine. Suppose you want to log whenever a function is called; treating each call to foo.cold.1
as a call to foo
would be misleading.
Also, regarding the idea of just getting rid of the suffixes:
For Rust code, it might theoretically be possible to get rid of suffixes that exist to ensure global uniqueness, under the assumption that Rust compilers will never actually produce two unrelated symbols with the same mangled name. But C and C++ compilers can and do produce such symbols (those languages expose local linkage directly in the form of static
, so you just need static
functions/variables with the same name in two different source files). So Valgrind's C++ demangler at least ought to be dealing with these suffixes, and in theory the same code could be reused for Rust.
On top of that, even in Rust code, suffixes like .cold.1
can't be avoided unless you want to get rid of the hot-cold splitting optimization altogether. When LLVM splits out a chunk of a function into its own function, it has to give the new function some name. It can't give it the same name as the original function since the original still exists. In lieu of that, naming it after the original function plus a suffix is clearly more useful than, say, giving it a random name.
I'm writing a Ghidra script for demangling Rust symbols, and I have a question about the <const>
grammar:
<const> = <type> <const-data>
| "p" // placeholder, shown as _
| <backref>
The "p"
and <backref>
cases are duplicated, because <type>
is defined as:
<type> = <basic-type>
| (... elided ...)
| <backref>
<basic-type> = "a" // i8
| (... elided ...)
| "p" // placeholder (e.g. for generic params), shown as _
So it seems to me that <const>
can be parsed as either "p"
or "p" <const-data>
, and similarly either <backref>
or <backref> <const-data>
. Is this intentional (i.e. is <const-data>
partially-optional)? It seems like I need to do the following to process <const>
:
input
as <type>
.
<type>
succeeds, parse rest
with <const-data>
.<const-data>
succeeds, return <type> <const-data>
.<const-data>
fails, backtrack and inspect <type>
.
<type> in ["p", <backref>]
, return <type>
.<const>
fails.<type>
fails, then we know that input
was not either "p"
or <backref>
(otherwise it would have succeeded), so <const>
fails.But I'm also not sure how to interpret a <const>
with an optional <const-data>
.
As an aside, this feels ambiguous, because unlike other optional parts of the grammar, <const-data>
has no distinguishing prefix:
<const-data> = ["n"] {<hex-digit>} "_"
I think it's actually unambiguous, but only indirectly, due to how sentinel characters of other parts of the grammar were selected. Say we have [[T; M]; N]
:
"A" "A" <type> <const> <const>
Given the current grammar, that could potentially be either:
"A" "A" <type> "p" {<hex-digit>} "_" <const>
or:
"A" "A" <type> "p" <basic-type> <const-data>
and <hex-digit>
can collide with <basic-type>
.
However, there's nowhere else in the grammar that allows <basic-type>
to be directly followed by "_"
, so if the data contained "p" <const-data>
but the parser tried to interpret <const>
as "p"
first, the parser will eventually figure out the problem and can backtrack.
I believe (but have not analyzed) that a collision in the other direction should also be detected indirectly. This partial optionality seems somewhat brittle though, especially as the RFC leaves extending <const-data>
as a future task.
I had a look at how rustc
implements const mangling:
https://github.com/rust-lang/rust/blob/028c6f1454787c068ff5117e9000a1de4fd98374/compiler/rustc_symbol_mangling/src/v0.rs#L578-L733
I think this corresponds to the following grammar:
<const> = "p"
| <backref>
| <subset-of-type> <const-data>
This also seems to be the grammar that rustc-demangle
uses.
So I now believe this is a bug in the RFC.
@str4d in production <const> = <type> <const-data>
, <type>
is never p
. There are pending updates in https://github.com/rust-lang/rfcs/pull/3161 which make this explicit in the grammar.
Aha, yes that does indeed address my concern: <const>
now never reaches <type>
other than via "V" <path> <const-fields>
, which has a separating prefix. Thanks!
https://bugs.kde.org/show_bug.cgi?id=445916 has details of some progress on the gcc/libiberty/valgrind side about handling the suffixes added by LLVM.
@Amanieu According to the edit logs, almost exactly a year ago you checked the box for "Linux perf
" but nobody has changed this file at all for v0
: torvalds/linux
@fb71c86 / tools/perf/util/demangle-rust.c
I ran into lack of support while trying to get good symbol names with perf record -g
.
However, looking closer at how it's driven, Rust v0
support seems to "just" require libbfd
from binutils 2.36
(or later), so I'll add that to the checkbox, in case anyone else looks at it again (I happen to have binutils 2.35.2
instead).
(EDIT: some distros seem to link binutils
against libiberty
from GCC sources, apparently ignoring binutils
's vendored copy, so in that case libiberty 11.0
is minimum required)
If anyone is familiar with Linux kernel patches, these can be removed nowadays (since Rust legacy
demangling has been working through libbfd
for many years now AFAICT):
I checked the box for Linux perf because no change to the kernel was required: the new demangler will automatically get picked up from the updated libiberty.
FYI, @lqd opened a PR that extends the RFC with "vendor-specific suffixes" like .llvm.123
: https://github.com/rust-lang/rfcs/pull/3224
Please provide your feedback if you have any.
If anyone is familiar with Linux kernel patches, these can be removed nowadays (since Rust
legacy
demangling has been working throughlibbfd
for many years now AFAICT):
There was https://lore.kernel.org/lkml/20220201185054.1041917-1-german.gomez@arm.com/, but German notes:
I have decided to drop this patch. It turns out that even shipped versions of libbfd and libiberty don't demangle some of the symbols completely For example: (doesn't strip away the hash at the end) _ZN10rs_tracing8internal11TRACE_STATE17h41dcd282cd61069dE.0 ==> rs_tracing::internal::TRACE_STATE::h41dcd282cd61069d (doesn't demangle full symbol) _ZN41_$LT$bool$u20$as$u20$core..fmt..Debug$GT$3fmt17h10f4b7b0094c3a75E.2262 ==> _$LT$bool$u20$as$u20$core..fmt..Debug$GT$::fmt::h10f4b7b0094c3a75 These are cleaned up afterwards by perf's demangler.
It turns out that even shipped versions of libbfd and libiberty don't demangle some of the symbols completely
How is that possible? The code in demangle-rust.c
is copy-pasted from what libiberty used to have for years (until I changed it when unifying it with the v0 demangler).
Also, I'm guessing the .0
and .2262
in those examples are stripped by perf
before passing them off to libiberty? (since libiberty doesn't handle those suffixes correctly and just refuses to demangle entirely AFAIK - this is only now getting fixed)
Regarding the hash at the end, I think that's controlled by demangler flags (-i
is --no-verbose
):
$ c++filt --version
GNU c++filt (GNU Binutils) 2.35.2
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
$ c++filt '_ZN10rs_tracing8internal11TRACE_STATE17h41dcd282cd61069dE'
rs_tracing::internal::TRACE_STATE::h41dcd282cd61069d
$ c++filt '_ZN41_$LT$bool$u20$as$u20$core..fmt..Debug$GT$3fmt17h10f4b7b0094c3a75E'
<bool as core::fmt::Debug>::fmt::h10f4b7b0094c3a75
$ c++filt -i '_ZN10rs_tracing8internal11TRACE_STATE17h41dcd282cd61069dE'
rs_tracing::internal::TRACE_STATE
$ c++filt -i '_ZN41_$LT$bool$u20$as$u20$core..fmt..Debug$GT$3fmt17h10f4b7b0094c3a75E'
<bool as core::fmt::Debug>::fmt
The lack of $
unescaping is really worrying however - I can only reproduce if I force C++ demangling:
$ c++filt --format gnu-v3 '_ZN41_$LT$bool$u20$as$u20$core..fmt..Debug$GT$3fmt17h10f4b7b0094c3a75E'
_$LT$bool$u20$as$u20$core..fmt..Debug$GT$::fmt::h10f4b7b0094c3a75
So maybe it's not the verbosity level, but "just" perf
somehow forcing C++-only mode, instead of the automatic default?
Either that or really old libiberty versions, but the same code was added to both libiberty and perf
around the same time.
https://bugs.kde.org/show_bug.cgi?id=445916 has been filed for possibly updating Valgrind's v0 demangler to handle these suffixes, though it's still a bit unclear to me if that's the right thing to do.
The suffixes are now handled by gcc/libiberty, and those changes have been imported into Valgrind, and this Valgrind bug has been closed.
Now that RFC https://github.com/rust-lang/rfcs/pull/3224 has been merged to resolve the suffix question, are there any implementations that still need updating in order to handle .
and $
suffixes? If not, shall we revive https://github.com/rust-lang/rust/pull/89917 for making v0 the default?
note that recent versions of rustc use -C symbol-mangling-version=v0
rather than the -Z
flag. the top comment led me astray..
note that recent versions of rustc use
-C symbol-mangling-version=v0
rather than the-Z
flag. the top comment led me astray..
@programmerjake Thanks for bringing it up (it got forgotten) - I just tried updating it, is the new version better?
note that recent versions of rustc use
-C symbol-mangling-version=v0
rather than the-Z
flag. the top comment led me astray..@programmerjake Thanks for bringing it up (it got forgotten) - I just tried updating it, is the new version better?
yup, thx!
So @Gankra was showing my some non-trivial v0
symbols and that got me pondering about richer presentation than just text. Even for plain text, I was able to prototype some stuff w/ jq
, but that's a pile of hacks approximating "balanced <>
/()
" parsing.
Ideally we wouldn't be going the long way around through the "parse v0 mangling -> emit quasi-Rust (type) syntax -> parse quasi-Rust syntax -> pretty-print AST" pipeline.
I'm not sure what the status is on @michaelwoerister's v0->AST demangler (which predates my "direct"/"allocation-less" v0 demangler in rustc-demangle
) but from a quick glance it should be mostly compatible already? Additional testing and/or consolidation with the rustc-demangle
repo would not be hard, if anyone is interested to pick it back up.
(Also, would it make sense to share code between them? Not trying to start a dozen bikesheds though, there should probably be dedicated issues for tracking anything that specific)
The trickier parts would be adding support for newer additions, e.g.:
String::from_utf8
), instead of having to do everything on the fly like rustc-demangler
does(EDIT: @EFanZh pointed out that they also have a demangler to an AST, which I likely have seen before, so I'm really sorry I lost track of it - also, not only does it appear to have all the new consts, it also handles str
constants the easy way like I was describing above, heh)
How would tools even use a demangled AST to provide a better experience? No one idea seems particularly strong on its own, but here's a few:
…::Foo<…>
- literally using ellipses - instead of foo::Foo<Bar, Baz>
)…
)<>
/()
/[]
/{}
) or "semantic highlighting" (typically coloring identifiers based on their name resolution results, AFAICT), are probably more relevantAnyway, all that said, I'm going to duplicate some screenshots here as well:
-
standing in for <small>
and +
for </small>
)
jq
contraption being applied to a longer Rust symbol (this one's automated; also works for C++)
@eddyb I have written a demangler that can demangle the latest v0 syntax symbol into a structured AST: https://github.com/EFanZh/ast-demangle/.
@eddyb I have written a demangler that can demangle the latest v0 syntax symbol into a structured AST: https://github.com/EFanZh/ast-demangle/.
Oh, my bad @EFanZh, now that I'm looking at it, I'm pretty sure I've seen it before and just forgot :(
Discussed in T-compiler backlog bonanza
The v0 symbol mangling has been implemented. From https://github.com/rust-lang/rust/pull/89917 we have considered making v0 the default, but we have held off on doing so in order to give external tools time to add support. In PR #90054 we did make v0 the default for builds of rustc
itself (but not object code generated by rustc
on other programs).
We need to figure out what criteria we will use in this and other cases to decide that "it is time" to switch the defaults.
(We also considered opening a separate tracking issue for the question of "when to switch the default", but at this point I think we would only open such a tracking issue if we were ready to close this one, #60705, itself.
@rustbot label: S-tracking-needs-to-bake
we have held off on doing so in order to give external tools time to add support
We need to figure out what criteria we will use in this and other cases to decide that "it is time" to switch the defaults.
The first thing to do would be to produce a list of tools that people want to support. For each tool, we should determine whether it supports v0, and, if so, the date of the first public release that features v0 support. Once each tool supports v0, and once each has supported v0 for long enough (precise criteria TBD), then stabilization should be unblocked.
Obviously this list cannot guarantee that it will exhaustively mention every tool ever made, but the only alternative would be to never stabilize v0 for fear of overlooking some tool. In the meantime, we can use a blog post to put out a general call to tool developers to ask them to ensure that v0 works with their tools.
The first thing to do would be to produce a list of tools that people want to support. For each tool, we should determine whether it supports v0, and, if so, the date of the first public release that features v0 support. Once each tool supports v0, and once each has supported v0 for long enough (precise criteria TBD), then stabilization should be unblocked.
Obviously this list cannot guarantee that it will exhaustively mention every tool ever made, but the only alternative would be to never stabilize v0 for fear of overlooking some tool. In the meantime, we can use a blog post to put out a general call to tool developers to ask them to ensure that v0 works with their tools.
Nominating to hopefully act as a forcing function to create this list.
One problem with v0 mangling that hasn't been identified: it completely breaks the cargo llvm-lines
tool. Here is example output with legacy mangling:
Lines Copies Function name
----- ------ -------------
134295 3225 (TOTAL)
6102 (4.5%, 4.5%) 18 (0.6%, 0.6%) alloc::raw_vec::RawVec<T,A>::grow_amortized
2641 (2.0%, 6.5%) 64 (2.0%, 2.5%) core::option::Option<T>::map
2329 (1.7%, 8.2%) 17 (0.5%, 3.1%) <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next
1716 (1.3%, 9.5%) 11 (0.3%, 3.4%) alloc::raw_vec::RawVec<T,A>::allocate_in
1694 (1.3%, 10.8%) 15 (0.5%, 3.9%) alloc::alloc::box_free
1476 (1.1%, 11.9%) 18 (0.6%, 4.4%) alloc::raw_vec::RawVec<T,A>::current_memory
1461 (1.1%, 13.0%) 3 (0.1%, 4.5%) hashbrown::raw::RawTable<T,A>::reserve_rehash
1456 (1.1%, 14.1%) 16 (0.5%, 5.0%) core::slice::iter::Iter<T>::new
1249 (0.9%, 15.0%) 8 (0.2%, 5.3%) <T as alloc::slice::hack::ConvertVec>::to_vec
1065 (0.8%, 15.8%) 5 (0.2%, 5.4%) aho_corasick::automaton::Automaton::leftmost_find_at_no_state_imp
And with v0 mangling:
Lines Copies Function name
----- ------ -------------
134295 3225 (TOTAL)
960 (0.7%, 0.7%) 1 (0.0%, 0.0%) <regex[455e3194582446bb]::prog::Program as core[d1a89b04220dd38d]::fmt::Debug>::fmt
722 (0.5%, 1.3%) 1 (0.0%, 0.1%) <regex[455e3194582446bb]::exec::ExecBuilder>::build
544 (0.4%, 1.7%) 1 (0.0%, 0.1%) <regex[455e3194582446bb]::dfa::Fsm>::exec_at
497 (0.4%, 2.0%) 1 (0.0%, 0.1%) <regex[455e3194582446bb]::compile::Compiler>::compile_many
494 (0.4%, 2.4%) 1 (0.0%, 0.2%) <aho_corasick[afd2d59d996825a5]::nfa::NFA<u32> as core[d1a89b04220dd38d]::fmt::Debug>::fmt
487 (0.4%, 2.8%) 1 (0.0%, 0.2%) <hashbrown[18cdbe82094945b3]::raw::RawTable<(&usize, &alloc[c687d6376d1d0c58]::string::String)>>::reserve_rehash::<hashbrown[18cdbe82094945b3]::map::make_hasher<&usize, &usize, &alloc[c687d6376d1d0c58]::string::String, std[e45faeee946555a1]::collections::hash::map::RandomState>::{closure#0}>
487 (0.4%, 3.1%) 1 (0.0%, 0.2%) <hashbrown[18cdbe82094945b3]::raw::RawTable<(alloc[c687d6376d1d0c58]::string::String, usize)>>::reserve_rehash::<hashbrown[18cdbe82094945b3]::map::make_hasher<alloc[c687d6376d1d0c58]::string::String, alloc[c687d6376d1d0c58]::string::String, usize, std[e45faeee946555a1]::collections::hash::map::RandomState>::{closure#0}>
487 (0.4%, 3.5%) 1 (0.0%, 0.2%) <hashbrown[18cdbe82094945b3]::raw::RawTable<(regex[455e3194582446bb]::dfa::State, u32)>>::reserve_rehash::<hashbrown[18cdbe82094945b3]::map::make_hasher<regex[455e3194582446bb]::dfa::State, regex[455e3194582446bb]::dfa::State, u32, std[e45faeee946555a1]::collections::hash::map::RandomState>::{closure#0}>
456 (0.3%, 3.8%) 1 (0.0%, 0.3%) <regex[455e3194582446bb]::compile::Compiler>::c_alternate
433 (0.3%, 4.1%) 1 (0.0%, 0.3%) <alloc[c687d6376d1d0c58]::alloc::Global as core[d1a89b04220dd38d]::alloc::Allocator>::shrink
Note the difference in the copies
column. cargo llvm-lines
entirely depends on the type-imprecison of legacy mangling. We go from having N different functions with the same name being combined, to every function being separate. E.g. with legacy mangling all the grow_amortized
instances end up in the same bucket, while with v0 mangling they look like this:
339 (0.3%, 6.8%) 1 (0.0%, 0.6%) <alloc[c687d6376d1d0c58]::raw_vec::RawVec<(char, char)>>::grow_amortized
339 (0.3%, 7.0%) 1 (0.0%, 0.6%) <alloc[c687d6376d1d0c58]::raw_vec::RawVec<(u8, u32)>>::grow_amortized
339 (0.3%, 7.3%) 1 (0.0%, 0.7%) <alloc[c687d6376d1d0c58]::raw_vec::RawVec<(usize, usize)>>::grow_amortized
This is probably a case where cargo llvm-lines
needs to change, rather than v0 mangling, but I thought it worth mentioning.
cc @dtolnay
Is or will there be an official Rust name mangling library (functionality), rather than demangling? Sometimes, one needs to mangle Rust item paths to look into binaries, e.g. like perf
does. I hope there will be a reference implementation of specification.
Why would perf need to mangle names? There is no way to exactly reproduce symbol names outside of rustc itself given that they contain a crate disambiguator whose value depends on the -Cmetadata
arguments passed when compiling the crate that defined the mentioned function/type (which for the standard library is unknown) as well as the exact rustc version used. Even two consecutive nightly releases will produce different symbol names.
Please re-read my sentence @bjorn3. I'm not claiming perf
mangles names.
@bjorn3 Thanks for your explanation. I hope you didn't assume every commenter should know these details. I think the question is legitimate. It was asked before in the context of GCC C++. The information required I have available, but that's not important now. Even if just the algorithm were to be specified like in the GCC case, perhaps enough of the translation symbol to mangled symbol can be reconstructed to find the specific symbol in a binary for a given item path. That's my use case but I don't assume this would be the only solution of the only use case for a mangling spec or reference implementation.
perhaps enough of the translation symbol to mangled symbol can be reconstructed to find the specific symbol in a binary for a given item path. That's my use case but I don't assume this would be the only solution of the only use case for a mangling spec or reference implementation.
It should be possible to have something like an api where you specify in the input a wildcard for the crate disambiguator and then the name mangling library would output a wildcard where it would otherwise print the crate disambiguator. Would this work for your use case?
I hope there will be a reference implementation of specification.
A reference implementation does already exist, essentially, as part of rustc. That it's not a reusable library just reflects that the goal of a known mangling scheme is the ability for 3rd party non-rustc tooling to be able to turn mangled symbols back into the demangled human-meaningful form. Being able to mangle symbols is explicitly a non-goal.
Sometimes, one needs to mangle Rust item paths to look into binaries, e.g. like
perf
does.
For binary introspection, demangling is sufficient. Given an unmangled name, to find the corresponding mangled names[^s], you don't mangle the unmangled name to compare to the mangled symbols; instead, you demangle the symbols from the binary to compare to the unmangled symbol. Most of the time you'll want the full list of demangled symbols anyway, e.g. for display or otherwise.
[^s]: Names, plural; multiple crates with the same name will have symbols which collide when unmangled and are disambiguated with the crate disambiguator.
If you want fully predictable names (e.g. for linking manually ABI-stable interfaces), you should be specifying them explicitly. It would be interesting to be able to request v0 mangling (without the use of disambiguators) rather than having to manually apply a mangling scheme, but that's a completely separate feature request than the use for Rust-only names tracked here.
@CAD97
When you need to step back to the same binary you demangled symbols of, and determine to what mangled symbol a demangled name refers, then you may want this functionality. Please also consider that binaries you have and even a build pipeline including source code, does not mean you are free to modify the source code to achieve predicable symbol names or whatever.
By the way, it's bit of a semantic discussion what demangling entails, in response to my functional requirement at least. Third party tooling like perf
may only demangle in a strict sense, but could be considered to mangle a given name that exists as a mangled symbol in the binary:
perf \
probe \
--exec $(realpath "mycrate/target/debug/deps/binary-cfcd9bd03ac152c2") \
--add="uprobe123=mycrate\:\:tests\:\:test_1"
perf
demangles the symbols and then matches with the unmangled name specified as --add
argument. So if perf
or such were to keep a mapping between the two and report that back, that would work for my particular use case as well. This procedure may not amount to demangling in the general, but it would cover some use cases without Rust people having to work on it.
It should be possible to have something like an api where you specify in the input a wildcard for the crate disambiguator and then the name mangling library would output a wildcard where it would otherwise print the crate disambiguator. Would this work for your use case?
Yes, sure. And perhaps there are other, forensic cases and such. Please note, I'm not an expert or involved in this mangling work here or elsewhere, just chiming in as a user with a practical use case that I think will be relevant to a subgroup of real-world developers (not detailing it since it's part of a paper to be published).
Now that RFC rust-lang/rfcs#3224 has been merged to resolve the suffix question, are there any implementations that still need updating in order to handle
.
and$
suffixes? If not, shall we revive #89917 for making v0 the default?
As far as I can tell gdb does not support suffixes using $
rather than .
. It's also somewhat unfortunate that GDB strips the suffix, rather than including it in the demangled string - but that's not unbearable.
@benpye Given that the only documented use of $
suffixes in the wild is for thread-local data on Mach-O, I don't think it's a showstopper for shipping this as the default.
I wonder if the compiler team would like to use the upcoming 2024 edition as an excuse to finally ship v0 mangling? This would let us roll it out gradually and in a way that can be easily rolled back by users, and since it's an implementation detail we could make it the default for all editions someday in the future if we really wanted to.
This is a tracking issue for the RFC "Rust Symbol Mangling (v0)" (rust-lang/rfcs#2603).
Current status:
Since #90128, you can control the mangling scheme with
-C symbol-mangling-version
, which can be:legacy
: the older mangling version, still the default currently-Z unstable-options
(to allow for eventual removal afterv0
becomes the default)v0
: the new RFC mangling version, as implemented by #57967(Before #90128, this flag was the nightly-only
-Z symbol-mangling-version
)To test the new mangling, set
RUSTFLAGS=-Csymbol-mangling-version=v0
(or changerustflags
in.cargo/config.toml
). Please note that only symbols from crates built with that flag will use the new mangling, and that tool support (e.g. debuggers) will be limited initially, until everything is upstreamed. However,RUST_BACKTRACE
andrustfilt
should work out of the box with either mangling version.Steps:
binutils
/gdb
(GNUlibiberty
)perf
(throughbinutils 2.36
and/orlibiberty 11.0
, or later versions - may vary between distros)valgrind
Unresolved questions:
Desired availability of tooling:
Linux:
Windows:
Windows does not have support for demangling either legacy or v0 Rust symbols and requires debuginfo to load the appropriate function name. As such, no special support is required.
macOS:
More investigation is needed to determine to what extent macOS system tools already support Rust v0 mangling.