Open ehuss opened 1 year ago
I was trying to remove OSO
symbols on macOS. Did a research and summarized in this issue: https://github.com/rust-lang/rust/issues/116948#issuecomment-1793617018. Now, I can see that there is no way for trim-paths
to really trim all paths unless rustc does some dirty trick for older macOS platforms. Would it be a blocker if we want to stabilize this feature, say, in 2024 edition?
cc @Urgau you might be interested.
What concerns are there regarding the use of relative paths causing issues with backtraces and debuggers?
I talked about this with @danielframpton the other day and he noticed that the current [package name]-[version]
rule for remapping might make things unnecessarily complicated for debuggers. For making the debugging experience easy to set up, it would be beneficial to require as little configuration as possible. The RFC does not explicitly make file system paths (as used by Cargo) match with how things are renamed, so in the general case the debugger needs a separate source file search path for each crate involved.
However, if we prescribed that each crate from was remapped to the relative path Cargo uses inside its $CARGO_HOME/registry/src
and $CARGO_HOME/git/checkouts
directories, then we have a fixed number of source directories a debugger needs to know about. The renaming would then be [registry]/[package name]-[version]
for registry crates and [package name]-[sha]
for git dependencies.
The current [package name]-[version]
rule is already pretty close, but that seems to be by accident. It would be good to make this something that can be relied on.
This might also help with backtraces?
I don't know if there is something similarly useful we can do for path dependencies.
Should we treat the current package the same as other packages?
This also came up. Unconditionally stripping the path to the current package might be too inflexible in some cases. E.g. when building a staticlib, one might want to make the paths coming from the current package identifiable in some way. If they all start with src/..
then they might clash with the paths of the codebase that uses the staticlib as a dependency.
However, if we prescribed that each crate from was remapped to the relative path Cargo uses inside its
$CARGO_HOME/registry/src
and$CARGO_HOME/git/checkouts
directories, then we have a fixed number of source directories a debugger needs to know about. The renaming would then be[registry]/[package name]-[version]
for registry crates and[package name]-[sha]
for git dependencies.
This plan sounds really tenable to me! Will do a change on cargo side for this, and see how people think about it.
For path dependencies, maybe we could have a common prefix for them, like cargo_deps@local/<pkg>-<version>
. Might not be useful as others dep kinds, but at least people can have a single location for path dependencies?
then they might clash with the paths of the codebase that uses the staticlib as a dependency.
Sorry I don't fully understand. Could you give a more concrete example of this?
For path dependencies, maybe we could have a common prefix for them, like cargo_deps@local/
- . Might not be useful as others dep kinds, but at least people can have a single location for path dependencies?
Maybe, yes. It's definitely worth thinking about this some more (and collecting ideas from other people).
Sorry I don't fully understand. Could you give a more concrete example of this?
As I understand it, the RFC states that paths from the workspace being compiled are trimmed by simply making them relative to the workspace root, that is, all file paths from workspace crates would look like src/foo/bar.rs
or some_crate_in_the_workspace/src/foo/bar.rs
. Those paths don't contain anything that would allow disambiguating them from other paths that are purely relative. One example where this is sub-optimal is when one has multiple Rust projects, each of which gets compiled into a staticlib that is then consumed within another, non-Rust project. Each of these Rust projects would have debuginfo file paths that just start with src/
and the paths for some of the files might clash , e.g. src/lib.rs
. The debugger would have no way to know which src/lib.rs
file to pick, even if both are in its source search path.
Therefore it would be good, if there was some way to avoid this problem. Allowing to prepend a prefix to the relative paths might be an option.
@michaelwoerister. Thank you for your feedback! I just opened a new issue https://github.com/rust-lang/cargo/issues/13171 for further discussion about remap rules in Cargo.
Hi everyone, while discussing @weihanglo's PR that tries to fix some debuginfo related issues, it started to look like there might be some limits to how accurate the debuginfo related scopes can be implemented. It seems like we'll have a hard time controlling what ends up in split- vs in unsplit-debuginfo.
I'm wondering: do we really need three debuginfo-related scopes? Or can we just have a single debuginfo scope? Do we have a concrete use case that requires the fine granularity of three scopes?
(cc @cbeuw)
There were requests from people who want to debug crash dumps in shipped binaries or profile them: https://github.com/rust-lang/rfcs/pull/3127#issuecomment-850387079 and https://github.com/rust-lang/rfcs/pull/3127#issuecomment-850491441.
Before settling on two debuginfo scopes, the original design to address this had a single debuginfo scope in rustc and made it Cargo's job to emit it (under the release profile) when debuginfo splitting is off, and to omit it when debuginfo splitting is on. The rationale behind having separate scopes (https://github.com/rust-lang/rfcs/pull/3127#discussion_r857881753) I believe was to reduce the need for special casing on Cargo's side.
Seeing that the complexity we gain in rustc is overwhelming the complexity we would've saved in Cargo, I think it's fine to go back to the previous approach which is making debuginfo remapping all-or-nothing.
However, under this design, rustc may be invoked with debuginfo splitting and no debuginfo path remapping. Therefore LLVM must not emit debuginfo paths anywhere in distributable files, otherwise we'd fail privacy and reproducibility. Is this possible? With separate scopes, at least rustc can know that unsplit debuginfo needs to be remapped and can babysit LLVM somewhat. rustc won't know this with a single debuginfo scope.
Thanks for the pointers, @cbeuw! I can see the rationale behind all of these concerns. It might be a tricky constellation of tradeoffs.
Taking a step back, I'm not sure if the scopes as they are defined now are that useful in practice. Right now, they are defined in terms of the kind of data that belongs to them (e.g. diagnostics, debuginfo, file!() macro expansions). But in practice, I think, it would make sense to define them in terms of which output artifacts they refer to. In other words, there would be three main scopes:
That would make it much easier for a user to decide what they need.
I'm also not quite clear on whether it is possible to have a completely sanitized binary together with an unsanitized separate debuginfo file. That is, does that even solve the usability problem? Can tools find the debuginfo if they get a sanitized binary? I think that question needs to be answered per tool and per platform, because the answer could be different for each combination. It would be good to have concrete use cases that we can test our implementation against.
I did some reading and some testing and it looks to me like split-dwarf cannot be used to sanitize binaries while keeping an unsanitized, separate debuginfo file for a good out-of-the-box debugging experience. The binary still contains the .debug_line
section which contains the paths to all the source files (see e.g. 7.3.2.1 First Partition (with Skeleton Unit)
in the DWARF 5 standard). So, if we want a sanitized binary, we have to sanitize these paths. But that breaks the out-of-the-box debugging experience. At that point the distinction between the split-debuginfo and unsplit-debuginfo scopes does not make much practical difference anymore, I think.
There is another way for separating out debuginfo via postprocessing:
objcopy --only-keep-debug my_binary my_binary.debug
strip -S my_binary -o my_binary.stripped
objcopy --add-gnu-debuglink=my_binary.debug my_binary.stripped
That approach really can sanitize the binary while keeping debuginfo intact. But we would have to make that the default for release builds in order to preserve the out-of-the-box debugging experience. I'm not sure how viable that is.
(NOTE: this is only relevant for Linux. I don't know the exact situation on Windows and macOS)
I think it is a good idea to revisit the question of whether we really want the default for --release
builds to be that the binary is sanitized but separate debuginfo files aren't. It seems to me that the whole point of sanitizing by default is to avoid unpleasant surprises when shipping --release
artifacts. Yet, only sanitizing half of the artifacts seems to undermine that goal of not causing any surprises. My suggestion would be to either sanitize everything or nothing by default.
An additional reason not go the mixed-sanitization route is that it does not actually solve the problem on Linux because with split-dwarf the relevant parts stay in the binary (at least as far as I can tell).
Regarding the debugging experience: There is an ongoing discussion one how to make that as seamless as possible even when paths are trimmed. We haven't landed on a perfect solution yet.
Possible solutions I see are:
cargo debug
subcommand that configures the debugger's source paths.cc'ing some folks who commented on not trimming separate debuginfo files in the RFC thread: @joshtriplett, @BurntSushi, @kornelski
For me this is very profile-dependent, so I'd be happy if Cargo had good profile-dependent defaults.
in dev
and bench
profiles I need the debug info working reliably and being maximally compatible. I never ship these, so I'm fine with full paths, even if they include my home dir and .cargo
's guts. I'd prefer debug info to just work in debuggers and profilers without configuring any search directories or mappings.
In release profile, I don't want full paths. I prefer smaller binaries and more privacy/obscurity. In release binaries I only care about getting stack traces, and for that function name + line number is enough. I like having an option to split debuginfo out to external files, so that I can make it an optional install.
For cargo install
I wouldn't want trim paths. And ditto for precompiled opensource binaries as created using eg cargo-dist and downloaded using cargo-binstall.
As a data point, I tend to strip
my cargo install
-ed binaries... and that's with my using a desktop machine where it was possible and cost-effective to upgrade my root/home drive from 500GB to 1TB to 2TB.
Heck, it might be time to reconsider UPXing them, now that I have a new CPU with plenty of cycles to burn on ensuring that perceived startup time isn't affected.
I'm not sure un-trimmed paths would be that beneficial to me in cargo install
-ed binaries and they'd take up more space.
Every once in a while I have a binary that panics or hangs. I want to be able to find out why without having to recompile it from scratch.
Whereas, in my case, I'm operating on the same "Why pessimize the 99.9% to save time on the 0.1% situations?" logic as the suggestion to use debug = 0
and strip = "debuginfo"
in the dev/debug profile to speed up link times at https://davidlattimore.github.io/working-on-rust-iteration-time.html
As long as all it takes is adding a command-line argument or one line to a config file and then running a command in the terminal, I much prefer things to be small and fast by default and just let it recompile while I go to fix a snack in the cases where I actually need these sorts of things.
Hell, Flatpak makes it more complicated than Cargo does, because you have to manually install the .Debug
package, then try to flatpak run --devel
, and then manually install the SDK package it complains about not having.
...plus, as a developer, I'd prefer that users be required to re-cargo install
to go into a debugging configuration so that, if I've made a new release since whatever they're running, they'll have an opportunity to encounter any potentially relevant bugfixes I've already done.
(As a user, I always give cargo install
a chance to update whatever it is before reporting bugs out of a sense of "do unto others as you'd have done to you", so having debugging aids hanging around in all cargo install
binaries all the time is also useless to me on those grounds.)
For cargo install I wouldn't want trim paths. And ditto for precompiled opensource binaries as created using eg cargo-dist and downloaded using cargo-binstall.
To clarify: trim-paths will still give file paths of the form some_crate-1.0.1/src/lib.rs
, so the information in a backtrace is still rather useful. If debuginfo is enabled (which currently is not the default for release builds), then the debugger should still be able to find the symbols, it just won't be able to find the source files currently.
EDIT: @bjorn3, would that be sufficient for your use case?
Here is a table that shows the various combinations of the trim-paths
and split-debuginfo
Cargo options and (to the right side of the ⇒
) what rustc
options these map to and what the consequences of this are.
trim-paths | split-debuginfo | ⇒ | --remap-path-scope | Binary | Separate Debuginfo | Surprising? | Useful? | Remarks |
---|---|---|---|---|---|---|---|---|
none | off | ⇒ | unsanitzed | Yes | ||||
macro | off | ⇒ | macro | semi-sanitized | ? | |||
object | off | ⇒ | macro + split-debuginfo + split-debuginfo-path + unsplit-debuginfo | sanitized | Kind of | ? | Must sanitize debuginfo too, otherwise "object" won't be sanitized | |
all | off | ⇒ | macro + split-debuginfo + split-debuginfo-path + unsplit-debuginfo | sanitized | Yes | |||
trim-paths | split-debuginfo | ⇒ | --remap-path-scope | Binary | Separate Debuginfo | Surprising? | Useful? | Remarks |
none | packed | ⇒ | unsanitzed | unsanitzed | Yes | |||
macro | packed | ⇒ | macro | semi-sanitized | unsanitzed | ? | ||
object | packed | ⇒ | macro + unsplit-debuginfo + split-debuginfo-path | sanitized | unsanitzed | Yes | ? | Debugger won't find source files |
all | packed | ⇒ | all | sanitized | sanitized | Yes (?) | Debugger won't find source files | |
trim-paths | split-debuginfo | ⇒ | --remap-path-scope | Binary | Separate Debuginfo | Surprising? | Useful? | Remarks |
none | unpacked | ⇒ | unsanitzed | unsanitzed | Yes | Fast builds with debuginfo | ||
macro | unpacked | ⇒ | macro | semi-sanitized | unsanitzed | ? | ||
object | unpacked | ⇒ | macro + unsplit-debuginfo + split-debuginfo-path | sanitized | unsanitzed | Yes | ? | Debugger won't find source files AND dwos |
all | unpacked | ⇒ | all | sanitized | sanitized | ? | Debugger won't find source files AND dwos | |
trim-paths | split-debuginfo | ⇒ | --remap-path-scope | Binary | Separate Debuginfo | Surprising? | Useful? | Remarks |
none | post-link | ⇒ | unsanitzed | unsanitzed | Kind of | Not much use in splitting | ||
macro | post-link | ⇒ | macro | sanitized | unsanitzed | A little | Kind of | |
object | post-link | ⇒ | macro + unsplit-debuginfo + split-debuginfo-path + split-debuginfo | sanitized | unsanitzed | Kind of | Might be useful when only distributing binary? | |
all | post-link | ⇒ | all | sanitized | sanitized | Yes |
Observations:
trim-paths=object, split-debuginfo=packed
leads to semi-broken debuginfo because line-tables, contained in the binary, have to be sanitized. But it the combination sounds like "don't touch debuginfo".split-debuginfo
and unsplit-debuginfo
remapping scopes -- and they fall into the surprising and useless category.unsplit-debuginfo
and split-debuginfo-path
My conclusion:
debuginfo
scope and not further subdivide that into split-debuginfo
, unsplit-debuginfo
, and split-debuginfo-path
. The only cases this makes a difference for are trim-paths=object, split-debuginfo=packed
and trim-paths=object, split-debuginfo=unpacked
but these cases have no real use case to begin with. Via #122450, it’s come to my attention that this will affect coverage instrumentation, because the coverage mappings embedded in the binary usually contain at least one absolute path, and any adjustment to those paths is constrained by the capabilities of the llvm-cov
tool.
For reference, Clang allows to separately control path remapping for code coverage: https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fcoverage-prefix-map
FYI, @Urgau's PR https://github.com/rust-lang/rust/pull/122450 will merge all debuginfo scopes into a single one, since splitting them did not actually solve the problem it was intended to solve (see https://github.com/rust-lang/rust/issues/111540#issuecomment-1994010274).
I'll wait for a few days before approving the PR, as to give everyone here a chance to speak up in case they disagree with the change.
...updates?
any movement here? still waiting on this and hoping scope didnt change from stripping all in release mode BY DEFAULT.
Not entirely sure on rustc side of this, but I feel like it is waiting for Cargo to report back the integration status. See also: https://github.com/rust-lang/cargo/issues/12137#issuecomment-2149647027
Random question: are these flags meant to affect the output of diagnostics as well? One could argue that they should.
Yes, that's the diagnostics
scope, see the --remap-path-scope
flag for more details.
status?
This is a tracking issue for the RFC 3127 (rust-lang/rfcs#3127).
This enhancement adds the
--remap-path-scope
command-line flag to control the scoping of how paths get remapped in the resulting binary.Issues: https://github.com/rust-lang/rust/labels/F-trim-paths Documentation (rustc): https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/remap-path-scope.html Documentation (cargo): https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#profile-trim-paths-option Cargo tracking issue: https://github.com/rust-lang/cargo/issues/12137
About tracking issues
Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
Unresolved Questions
Implementation history