Open mstange opened 1 year ago
The main difference between the "traditional" objcopy --only-keep-debug
path and the split-DWARF -> DWP
path is whether or not the linker is required to process (most of) the debug data. There are codebases out there for which running all the debug info through the linker can be burdensome or even cause the linker to OOM.
I haven't tested this myself but there are no theoretical obstacles to being able to objcopy --only-keep-debug
the skeleton units into a separate debug file with a debuglink. That debug file could then be placed with the DWP or even combined with the DWP (since the DWP sections don't overlap with the "regular" debug sections) allowing you to achieve a binary with no debug information and potentially even a single debug file. It's merely a question of whether the tooling does or can be configured to look in the right places for the information.
Folding the DWP sections into the debug file is an excellent idea and would solve all the problems I listed.
I don't have strong feelings on whether this would be a better default for packed mode or not - however, I'm not sure this is something we can change now that Split DWARF is available on stable.
I tried using split-debuginfo=packed
and it doesn't seem to actually split all of the debug info. See #106592.
I'm not sure this is something we can change now that Split DWARF is available on stable.
I don't see why this would be true? We stabilized the existence of the option, but not how it's implemented.
For -Csplit-debuginfo=unpacked the actual implementation is not specified, but for -Csplit-debuging=packed I think the intention was to specify it. For example cargo needs to know how the split debuginfo format works (file extension, file or directory, ...) to be able to uplift the split debuginfo files.
I don't see why this would be true? We stabilized the existence of the option, but not how it's implemented.
I'm not sure what the consensus is here, I don't have strong feelings personally, but I find it a little surprising that we would guarantee "this option will pack debuginfo" and no more - that you couldn't even rely on the file it creates continuing to be created at all?
that you couldn't even rely on the file it creates continuing to be created at all?
That seems fine? I don't think anyone should be hard-coding filenames personally, if you need to be resilient across multiple versions you should use --message-format json which tells you the path without having to guess.
cargo --message-format json
gets that information through a combination of hardcoded information and the output of rustc --print file-names
. rustc --print file-names
doesn't list the name of debuginfo files, so cargo has to hard code it. Changing both rustc and cargo together is not an option. Cargo has to work with newer rustc and rustc up to 2 releases older (stable rustc in nightly cargo). Furthermore for build systems that don't use cargo --message-format json
doesn't solve anything.
rustc --print file-names doesn't list the name of debuginfo files, so cargo has to hard code it.
That seems easy to fix? We can add it to the list of names rustc prints.
Cargo has to work with newer rustc and rustc up to 2 releases older (stable rustc in nightly cargo).
Why?
That seems easy to fix? We can add it to the list of names rustc prints.
That will break parsing of the output. The expected format is that for each --crate-type
argument it shows exactly one filename, which is the main linked artifact. It doesn't show .o
for --emit obj
or .rmeta
for --emit metadata
either. In addition it can be combined with other --print
and in fact cargo does so. For example echo | rustc --print sysroot --print file-names --crate-name ____ - --crate-type bin --crate-type lib --emit obj,metadata
shows:
/home/gh-bjorn3/.rustup/toolchains/nightly-2023-02-06-aarch64-unknown-linux-gnu
____
lib____.rlib
Why?
For working with older rustc as I understand it, this is because development of cargo happens using stable rustc rather than the associated nightly. For working with newer rustc, this is because there is resistance to allowing stable cargo to depend on nightly rustc features afaik.
Realistically DWPs were practically unusable before rust-lang/cargo#11572 so I don't think it's unreasonable to assume the meaning of -Csplit-debuginfo=packed could be changed if it's done soon.
That said, if it's desirable for the behavior in the opening comment to be built into the toolchain (and I'm not convinced that it is) I don't see any reason it can't be done under a new name (i.e. -Csplit-debuginfo=bikeshed here).
Because the primary mechanism for choosing your split-debuginfo mode is seemingly a Cargo profile, it's a tad suboptimal for the "right choice" to have different names.
I don't see the value of a generic name like "packed". I can see the value of using "unpacked" to mean that you want the fastest possible compile/link on any platform, but the reason to choose "packed" is because you then want to do something with the resulting packed file, and so you must already have platform specific knowledge of what that packed file is, so why not specify that unambiguously. The only reason I can see to use a generic name like "packed" is because Cargo doesn't let you specify this option per platform, but the correct fix for that is to remove that limitation of Cargo. There's no fundamental reason why using "packed" for one platform means you should be using "packed" for all platforms. This has bitten me when I have wanted them to be different, see the hack in https://github.com/gimli-rs/addr2line/pull/258.
I agree, the UX is a real mess right now. I've been trying to work out "good shippable build settings" to pick for users in cargo-dist and one of my big goals was to help people produce proper debuginfo files. Ideally I could pick some good defaults, write it to the Cargo.toml, and let the user tweak them "normally" if they disagree with my choices...
But as of this morning I've run into enough messed up issues with the debuginfo flags that I think I need to just shelve that stuff until further notice. Like I could potentially do a bunch of really custom env-var stuff like the above gimli hack, but that feels... wrong? Ideally a "good build configuration" should just be something you can put in your Cargo.toml.
(Unfortunately this is already untrue because of rfc 1721 (crt-static), so there's a lot of work that needs to happen for that to be a viable goal.)
In https://github.com/rust-lang/compiler-team/issues/721 it has been approved to put objcopy based split debuginfo under a new -Csplit-debuginfo=post-link
flag. This is not implemented yet.
@davidtwco @bjorn3 @philipc @Gankra @khuey
Hi all, I was reading up on split dwarf and DWP files and came away rather confused.
What is a situation in which using split dwarf + DWP is preferable over the traditional way of splitting an ELF file into a binary and a debug file with the help of
objcopy --only-keep-debug
?It seems to me that DWP files don't address either of these situations:
On macOS and Windows,
split-debuginfo=packed
addresses the second case. The binary is small because all debug information has been removed from it. And the binary contains an ID (and on Windows also a file path) which allows looking up the debug file (i.e. the PDB file / dSYM bundle). I can drop the debug file in a big directory with the debug files from all my other releases, and it's easy to find the right one for a given binary based on its ID; on Windows I can even put the PDB file on a symbol server and my debugger will find it based on the information in the binary. Also, if I want to look up symbol information for an address, for example for a crash stack, I can get it from the debug file and don't need the binary.DWP files do not appear to address the second use case at all:
gdb
expects the DWP file to be located in the same directory as the binary, with the binary's filename plus a.dwp
extension. If I have a binary and a corresponding DWP file, at least I can validate that it's the correct DWP file by checking for matching DWO IDs, I think.Please correct me if any of the above is incorrect. I'm basing these statements on my read of the DebugFissionDWP document and on the fact that gimli's dwarfdump example requires the
--dwp
argument to be paired with a--dwo-parent
argument. I haven't read the DWARF 5 spec yet, and I might also be mixing up GNU extension dwp with DWARF 5 dwp.Anyway, all of this leaves me wondering whether it would have a better choice to make
split-debuginfo=packed
behave in the "Linux distro packager" way: Compile everything without split dwarf, and then run the commands from https://github.com/rust-lang/rust/issues/34651#issuecomment-1253008705 to split the resulting file into a binary and a.debug
file. This would satisfy all the use cases I listed above:Also, just to clarify, I think split dwarf is perfectly fine for
split-debuginfo=unpacked
. My issue is only withsplit-debuginfo=packed
.Thoughts?