rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.71k stars 12.5k forks source link

Tracking Issue for trim-paths RFC 3127 #111540

Open ehuss opened 1 year ago

ehuss commented 1 year ago

This is a tracking issue for the RFC 3127 (rust-lang/rfcs#3127).

This enhancement adds the --remap-path-scope command-line flag to control the scoping of how paths get remapped in the resulting binary.

Issues: https://github.com/rust-lang/rust/labels/F-trim-paths Documentation (rustc): https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/remap-path-scope.html Documentation (cargo): https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#profile-trim-paths-option Cargo tracking issue: https://github.com/rust-lang/cargo/issues/12137

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

Implementation history

weihanglo commented 10 months ago

I was trying to remove OSO symbols on macOS. Did a research and summarized in this issue: https://github.com/rust-lang/rust/issues/116948#issuecomment-1793617018. Now, I can see that there is no way for trim-paths to really trim all paths unless rustc does some dirty trick for older macOS platforms. Would it be a blocker if we want to stabilize this feature, say, in 2024 edition?

cc @Urgau you might be interested.

michaelwoerister commented 9 months ago

What concerns are there regarding the use of relative paths causing issues with backtraces and debuggers?

I talked about this with @danielframpton the other day and he noticed that the current [package name]-[version] rule for remapping might make things unnecessarily complicated for debuggers. For making the debugging experience easy to set up, it would be beneficial to require as little configuration as possible. The RFC does not explicitly make file system paths (as used by Cargo) match with how things are renamed, so in the general case the debugger needs a separate source file search path for each crate involved.

However, if we prescribed that each crate from was remapped to the relative path Cargo uses inside its $CARGO_HOME/registry/src and $CARGO_HOME/git/checkouts directories, then we have a fixed number of source directories a debugger needs to know about. The renaming would then be [registry]/[package name]-[version] for registry crates and [package name]-[sha] for git dependencies.

The current [package name]-[version] rule is already pretty close, but that seems to be by accident. It would be good to make this something that can be relied on.

This might also help with backtraces?

I don't know if there is something similarly useful we can do for path dependencies.

Should we treat the current package the same as other packages?

This also came up. Unconditionally stripping the path to the current package might be too inflexible in some cases. E.g. when building a staticlib, one might want to make the paths coming from the current package identifiable in some way. If they all start with src/.. then they might clash with the paths of the codebase that uses the staticlib as a dependency.

weihanglo commented 9 months ago

However, if we prescribed that each crate from was remapped to the relative path Cargo uses inside its $CARGO_HOME/registry/src and $CARGO_HOME/git/checkouts directories, then we have a fixed number of source directories a debugger needs to know about. The renaming would then be [registry]/[package name]-[version] for registry crates and [package name]-[sha] for git dependencies.

This plan sounds really tenable to me! Will do a change on cargo side for this, and see how people think about it.

For path dependencies, maybe we could have a common prefix for them, like cargo_deps@local/<pkg>-<version>. Might not be useful as others dep kinds, but at least people can have a single location for path dependencies?

then they might clash with the paths of the codebase that uses the staticlib as a dependency.

Sorry I don't fully understand. Could you give a more concrete example of this?

michaelwoerister commented 9 months ago

For path dependencies, maybe we could have a common prefix for them, like cargo_deps@local/-. Might not be useful as others dep kinds, but at least people can have a single location for path dependencies?

Maybe, yes. It's definitely worth thinking about this some more (and collecting ideas from other people).

Sorry I don't fully understand. Could you give a more concrete example of this?

As I understand it, the RFC states that paths from the workspace being compiled are trimmed by simply making them relative to the workspace root, that is, all file paths from workspace crates would look like src/foo/bar.rs or some_crate_in_the_workspace/src/foo/bar.rs. Those paths don't contain anything that would allow disambiguating them from other paths that are purely relative. One example where this is sub-optimal is when one has multiple Rust projects, each of which gets compiled into a staticlib that is then consumed within another, non-Rust project. Each of these Rust projects would have debuginfo file paths that just start with src/ and the paths for some of the files might clash , e.g. src/lib.rs. The debugger would have no way to know which src/lib.rs file to pick, even if both are in its source search path.

Therefore it would be good, if there was some way to avoid this problem. Allowing to prepend a prefix to the relative paths might be an option.

weihanglo commented 9 months ago

@michaelwoerister. Thank you for your feedback! I just opened a new issue https://github.com/rust-lang/cargo/issues/13171 for further discussion about remap rules in Cargo.

michaelwoerister commented 7 months ago

Hi everyone, while discussing @weihanglo's PR that tries to fix some debuginfo related issues, it started to look like there might be some limits to how accurate the debuginfo related scopes can be implemented. It seems like we'll have a hard time controlling what ends up in split- vs in unsplit-debuginfo.

I'm wondering: do we really need three debuginfo-related scopes? Or can we just have a single debuginfo scope? Do we have a concrete use case that requires the fine granularity of three scopes?

(cc @cbeuw)

cbeuw commented 7 months ago

There were requests from people who want to debug crash dumps in shipped binaries or profile them: https://github.com/rust-lang/rfcs/pull/3127#issuecomment-850387079 and https://github.com/rust-lang/rfcs/pull/3127#issuecomment-850491441.

Before settling on two debuginfo scopes, the original design to address this had a single debuginfo scope in rustc and made it Cargo's job to emit it (under the release profile) when debuginfo splitting is off, and to omit it when debuginfo splitting is on. The rationale behind having separate scopes (https://github.com/rust-lang/rfcs/pull/3127#discussion_r857881753) I believe was to reduce the need for special casing on Cargo's side.

Seeing that the complexity we gain in rustc is overwhelming the complexity we would've saved in Cargo, I think it's fine to go back to the previous approach which is making debuginfo remapping all-or-nothing.

However, under this design, rustc may be invoked with debuginfo splitting and no debuginfo path remapping. Therefore LLVM must not emit debuginfo paths anywhere in distributable files, otherwise we'd fail privacy and reproducibility. Is this possible? With separate scopes, at least rustc can know that unsplit debuginfo needs to be remapped and can babysit LLVM somewhat. rustc won't know this with a single debuginfo scope.

michaelwoerister commented 7 months ago

Thanks for the pointers, @cbeuw! I can see the rationale behind all of these concerns. It might be a tricky constellation of tradeoffs.

Taking a step back, I'm not sure if the scopes as they are defined now are that useful in practice. Right now, they are defined in terms of the kind of data that belongs to them (e.g. diagnostics, debuginfo, file!() macro expansions). But in practice, I think, it would make sense to define them in terms of which output artifacts they refer to. In other words, there would be three main scopes:

That would make it much easier for a user to decide what they need.

I'm also not quite clear on whether it is possible to have a completely sanitized binary together with an unsanitized separate debuginfo file. That is, does that even solve the usability problem? Can tools find the debuginfo if they get a sanitized binary? I think that question needs to be answered per tool and per platform, because the answer could be different for each combination. It would be good to have concrete use cases that we can test our implementation against.

michaelwoerister commented 7 months ago

I did some reading and some testing and it looks to me like split-dwarf cannot be used to sanitize binaries while keeping an unsanitized, separate debuginfo file for a good out-of-the-box debugging experience. The binary still contains the .debug_line section which contains the paths to all the source files (see e.g. 7.3.2.1 First Partition (with Skeleton Unit) in the DWARF 5 standard). So, if we want a sanitized binary, we have to sanitize these paths. But that breaks the out-of-the-box debugging experience. At that point the distinction between the split-debuginfo and unsplit-debuginfo scopes does not make much practical difference anymore, I think.

There is another way for separating out debuginfo via postprocessing:

objcopy --only-keep-debug my_binary my_binary.debug
strip -S my_binary -o my_binary.stripped
objcopy --add-gnu-debuglink=my_binary.debug my_binary.stripped

That approach really can sanitize the binary while keeping debuginfo intact. But we would have to make that the default for release builds in order to preserve the out-of-the-box debugging experience. I'm not sure how viable that is.

(NOTE: this is only relevant for Linux. I don't know the exact situation on Windows and macOS)

michaelwoerister commented 7 months ago

I think it is a good idea to revisit the question of whether we really want the default for --release builds to be that the binary is sanitized but separate debuginfo files aren't. It seems to me that the whole point of sanitizing by default is to avoid unpleasant surprises when shipping --release artifacts. Yet, only sanitizing half of the artifacts seems to undermine that goal of not causing any surprises. My suggestion would be to either sanitize everything or nothing by default.

An additional reason not go the mixed-sanitization route is that it does not actually solve the problem on Linux because with split-dwarf the relevant parts stay in the binary (at least as far as I can tell).

Regarding the debugging experience: There is an ongoing discussion one how to make that as seamless as possible even when paths are trimmed. We haven't landed on a perfect solution yet.

Possible solutions I see are:

cc'ing some folks who commented on not trimming separate debuginfo files in the RFC thread: @joshtriplett, @BurntSushi, @kornelski

kornelski commented 7 months ago

For me this is very profile-dependent, so I'd be happy if Cargo had good profile-dependent defaults.

bjorn3 commented 7 months ago

For cargo install I wouldn't want trim paths. And ditto for precompiled opensource binaries as created using eg cargo-dist and downloaded using cargo-binstall.

ssokolow commented 7 months ago

As a data point, I tend to strip my cargo install-ed binaries... and that's with my using a desktop machine where it was possible and cost-effective to upgrade my root/home drive from 500GB to 1TB to 2TB.

Heck, it might be time to reconsider UPXing them, now that I have a new CPU with plenty of cycles to burn on ensuring that perceived startup time isn't affected.

I'm not sure un-trimmed paths would be that beneficial to me in cargo install-ed binaries and they'd take up more space.

bjorn3 commented 7 months ago

Every once in a while I have a binary that panics or hangs. I want to be able to find out why without having to recompile it from scratch.

ssokolow commented 7 months ago

Whereas, in my case, I'm operating on the same "Why pessimize the 99.9% to save time on the 0.1% situations?" logic as the suggestion to use debug = 0 and strip = "debuginfo" in the dev/debug profile to speed up link times at https://davidlattimore.github.io/working-on-rust-iteration-time.html

As long as all it takes is adding a command-line argument or one line to a config file and then running a command in the terminal, I much prefer things to be small and fast by default and just let it recompile while I go to fix a snack in the cases where I actually need these sorts of things.

Hell, Flatpak makes it more complicated than Cargo does, because you have to manually install the .Debug package, then try to flatpak run --devel, and then manually install the SDK package it complains about not having.

ssokolow commented 7 months ago

...plus, as a developer, I'd prefer that users be required to re-cargo install to go into a debugging configuration so that, if I've made a new release since whatever they're running, they'll have an opportunity to encounter any potentially relevant bugfixes I've already done.

(As a user, I always give cargo install a chance to update whatever it is before reporting bugs out of a sense of "do unto others as you'd have done to you", so having debugging aids hanging around in all cargo install binaries all the time is also useless to me on those grounds.)

michaelwoerister commented 7 months ago

For cargo install I wouldn't want trim paths. And ditto for precompiled opensource binaries as created using eg cargo-dist and downloaded using cargo-binstall.

To clarify: trim-paths will still give file paths of the form some_crate-1.0.1/src/lib.rs, so the information in a backtrace is still rather useful. If debuginfo is enabled (which currently is not the default for release builds), then the debugger should still be able to find the symbols, it just won't be able to find the source files currently.

EDIT: @bjorn3, would that be sufficient for your use case?

michaelwoerister commented 6 months ago

Here is a table that shows the various combinations of the trim-paths and split-debuginfo Cargo options and (to the right side of the ) what rustc options these map to and what the consequences of this are.

trim-paths split-debuginfo --remap-path-scope Binary Separate Debuginfo Surprising? Useful? Remarks
none off   unsanitzed     Yes  
macro off macro semi-sanitized     ?  
object off macro + split-debuginfo + split-debuginfo-path + unsplit-debuginfo sanitized   Kind of ? Must sanitize debuginfo too, otherwise "object" won't be sanitized
all off macro + split-debuginfo + split-debuginfo-path + unsplit-debuginfo sanitized     Yes  
trim-paths split-debuginfo --remap-path-scope Binary Separate Debuginfo Surprising? Useful? Remarks
none packed   unsanitzed unsanitzed   Yes  
macro packed macro semi-sanitized unsanitzed   ?  
object packed macro + unsplit-debuginfo + split-debuginfo-path sanitized unsanitzed Yes ? Debugger won't find source files
all packed all sanitized sanitized   Yes (?) Debugger won't find source files
trim-paths split-debuginfo --remap-path-scope Binary Separate Debuginfo Surprising? Useful? Remarks
none unpacked   unsanitzed unsanitzed   Yes Fast builds with debuginfo
macro unpacked macro semi-sanitized unsanitzed   ?  
object unpacked macro + unsplit-debuginfo + split-debuginfo-path sanitized unsanitzed Yes ? Debugger won't find source files AND dwos
all unpacked all sanitized sanitized   ? Debugger won't find source files AND dwos
trim-paths split-debuginfo --remap-path-scope Binary Separate Debuginfo Surprising? Useful? Remarks
none post-link   unsanitzed unsanitzed   Kind of Not much use in splitting
macro post-link macro sanitized unsanitzed A little Kind of  
object post-link macro + unsplit-debuginfo + split-debuginfo-path + split-debuginfo sanitized unsanitzed   Kind of Might be useful when only distributing binary?
all post-link all sanitized sanitized   Yes  

Observations:

My conclusion:

Zalathar commented 5 months ago

Via #122450, it’s come to my attention that this will affect coverage instrumentation, because the coverage mappings embedded in the binary usually contain at least one absolute path, and any adjustment to those paths is constrained by the capabilities of the llvm-cov tool.

michaelwoerister commented 5 months ago

For reference, Clang allows to separately control path remapping for code coverage: https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fcoverage-prefix-map

michaelwoerister commented 5 months ago

FYI, @Urgau's PR https://github.com/rust-lang/rust/pull/122450 will merge all debuginfo scopes into a single one, since splitting them did not actually solve the problem it was intended to solve (see https://github.com/rust-lang/rust/issues/111540#issuecomment-1994010274).

I'll wait for a few days before approving the PR, as to give everyone here a chance to speak up in case they disagree with the change.

redactedontop commented 1 month ago

...updates?

liciana24 commented 1 month ago

any movement here? still waiting on this and hoping scope didnt change from stripping all in release mode BY DEFAULT.

weihanglo commented 1 month ago

Not entirely sure on rustc side of this, but I feel like it is waiting for Cargo to report back the integration status. See also: https://github.com/rust-lang/cargo/issues/12137#issuecomment-2149647027

estebank commented 1 month ago

Random question: are these flags meant to affect the output of diagnostics as well? One could argue that they should.

Urgau commented 1 month ago

Yes, that's the diagnostics scope, see the --remap-path-scope flag for more details.