Closed h1467792822 closed 9 months ago
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:
@rustbot concern reason-for-concern
<description of the concern>
Concerns can be lifted with:
@rustbot resolve reason-for-concern
See documentation at https://forge.rust-lang.org
cc @rust-lang/compiler @rust-lang/compiler-contributors
m-ou-se is willing to do the review work
I'm happy to give advice where I can, but I'm not part of the compiler team.
@rustbot second
Let's definitely implement this as an unstable option so we can gather experience about performance and how to expose it to users.
I'd suggest renaming the MCP to something like "Provide option to shorten symbol names by replacing them with a digest"
@rustbot label -final-comment-period +major-change-accepted
Proposal
Added an optimization option that allows users to replace full symbol mangling names based on hash digests, greatly reducing the length of symbol names in dylib. At the expense of commissioning capabilities such as readability of symbol names, this option eliminates the space bottleneck encountered by using Rust to replace existing C/C++ functional modules in resource-constrained scenarios.
Motivation
The average length of symbol names in the rust standard library is about 100 bytes, while the average length of symbol names in the C++ standard library is about 50 bytes. In some embedded environments where dynamic library are widely used, rust dynamic library symbol name space has become one of the key bottlenecks of application, Especially when the existing C/C++ module is reconstructed into the rust module.
The standard library is a typical example. The proportion of the
.dynstr
segment in the entire elf file in the standard library of rust and that in the standard library of c++ is compared. Compare the data of specific symbols in.dynstr
. The comparison data is as follows:The proportion of
.dynstr
in the rust standard library is about twice that in C++:.dynstr
symbol_mangling_version=v0
)Remarks:
panc="abort", opt-leve="z", codegen-units=1,strip=true, debug=true
. and the.rustc
section is removed.In C++, the average length of symbol names after mangling is about 50, while in rust, the length of symbol names after mangling is about 100.
_ZN
_ZN
symbol_mangling_version=v0
)_R
_R
_ZN
_ZN
Finding a way to shorten the symbolic names of rust dynamic libraries is of great value.
Design
Shorter symbolic names based on digests
The solution is to replace its full mangling name with a digest, select a specific hash algorithm to generate a digest from the full symbolic mangling name. and the space of the
.dynstr
section can be greatly reduced, even better than that of C++.We can use post-processing tools to do this, right? For example,
objcopy
. Unfortunately,objcopy --redefine-syms
cannot modify or shorten the symbol name of.dynsym
. Using post-processing tools to reduce dynstr segment space is much more difficult than expected. If rustc itself can solve the problem of using rust language in specific scenarios, it will be the simplest and most convenient solution for users and will greatly promote the application of rust language in a wider range of scenarios.Usage Constraints
For debugging, If you replace the full symbol name with the digest, it is difficult to find the corresponding code based on the symbol name of the dynamic library. Therefore, the debugging information backed up by the user and the full code are required. Considering that crate is widely used in rust, the final symbolic name consists of crate and a digest is a reasonable scheme.
What can I do if a symbol name conflict occurs due to a hash conflict? After all, hash conflicts are theoretically unavoidable. There are two scenarios for this conflict, one is inside the dylib and the other is between multiple dylibs.
-Wl, --exclude-libs
function of gcc, In rust,-C link-arg=-Wl,--exclude-libs=libfoo.rlib
can be used to avoid exporting symbols in the upstream rlib.In addition,it is not compatible with existing options:
-C instrument-coverage
.Final Design Scheme
The value
hashed
ofsymbol-mangling-version
is added to support shortening symbol names.-C symbol-mangling-version=hashed -Z unstable-options
can be used._RNxC{length}{crate name}{length}H{64-bits hash}
. For generic functions,the format of the final symbol name is_RNxC{length}{instantating-crate name}{length}H{64-bits hash}
. complies with the existing specification (https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html#syntax-of-mangled-names).base-62
and the final terminator_
is removed because it does not help prevent hash collisions.-C metadata=<salt>
to eliminate rare hash conflicts.Test Data
According to the test data, the total space of the entire dylib is saved by about 20% when this option is used. For details, see the PR: https://github.com/rust-lang/rust/pull/118636
Mentors or Reviewers
@m-ou-se is willing to do the review work. Very grateful for their help!
Process
The main points of the Major Change Process are as follows:
@rustbot second
.-C flag
, then full team check-off is required.@rfcbot fcp merge
on either the MCP or the PR.You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.