Are dwp files the right choice for split-debuginfo=packed on Linux?

mstange commented 1 year ago

@davidtwco @bjorn3 @philipc @Gankra @khuey

Hi all, I was reading up on split dwarf and DWP files and came away rather confused.

What is a situation in which using split dwarf + DWP is preferable over the traditional way of splitting an ELF file into a binary and a debug file with the help of objcopy --only-keep-debug?

It seems to me that DWP files don't address either of these situations:

For local development I want fast builds, and I don't mind keeping around a whole lot of intermediate files in order to have a working debugger.
For releases I want a small binary and a single redistributable debug file.

On macOS and Windows, split-debuginfo=packed addresses the second case. The binary is small because all debug information has been removed from it. And the binary contains an ID (and on Windows also a file path) which allows looking up the debug file (i.e. the PDB file / dSYM bundle). I can drop the debug file in a big directory with the debug files from all my other releases, and it's easy to find the right one for a given binary based on its ID; on Windows I can even put the PDB file on a symbol server and my debugger will find it based on the information in the binary. Also, if I want to look up symbol information for an address, for example for a crash stack, I can get it from the debug file and don't need the binary.

DWP files do not appear to address the second use case at all:

The binary still contains a small amount of debug information (e.g. "skeleton units"). This information is needed to make sense of the information in the DWP file. So the binary is not as small as it could be. It also contains the debug information for the Rust standard library, which is not moved into the DWP file. If I want to strip off the debug information from the binary, I end up with three files: The stripped binary, the file with the skeleton + std debug information, and the DWP file.
The binary does not contain a pointer to the DWP file. It contains DWO IDs and paths to the original DWO files, but the point of the DWP file is that I shouldn't need those DWO files. So those paths are not useful. It sounds like gdb expects the DWP file to be located in the same directory as the binary, with the binary's filename plus a .dwp extension. If I have a binary and a corresponding DWP file, at least I can validate that it's the correct DWP file by checking for matching DWO IDs, I think.
Because the DWP file is identified by being in the same directory as the binary, I cannot drop it into a big directory with my other symbol files.
I cannot use a DWP file on a symbol server, for example with debuginfod, because afaik the DWP file has no build ID.
To resolve the symbol / debug info for an address, the DWP file is not sufficient on its own; I have to consult both the binary with the skeleton units + std debug info, as well as the DWP file.

Please correct me if any of the above is incorrect. I'm basing these statements on my read of the DebugFissionDWP document and on the fact that gimli's dwarfdump example requires the --dwp argument to be paired with a --dwo-parent argument. I haven't read the DWARF 5 spec yet, and I might also be mixing up GNU extension dwp with DWARF 5 dwp.

Anyway, all of this leaves me wondering whether it would have a better choice to make split-debuginfo=packed behave in the "Linux distro packager" way: Compile everything without split dwarf, and then run the commands from https://github.com/rust-lang/rust/issues/34651#issuecomment-1253008705 to split the resulting file into a binary and a .debug file. This would satisfy all the use cases I listed above:

The binary contains no debug information.
The binary and the debug file have the same ELF build ID.
The binary contains a "debuglink" with the filename of the debug file.
I can put the debug file on a debuginfod server, and it is found by gdb / perf / etc. by its build ID.
To symbolicate an address, I only need the debug file.

Also, just to clarify, I think split dwarf is perfectly fine for split-debuginfo=unpacked. My issue is only with split-debuginfo=packed.

Thoughts?

khuey commented 1 year ago

The main difference between the "traditional" objcopy --only-keep-debug path and the split-DWARF -> DWP path is whether or not the linker is required to process (most of) the debug data. There are codebases out there for which running all the debug info through the linker can be burdensome or even cause the linker to OOM.

I haven't tested this myself but there are no theoretical obstacles to being able to objcopy --only-keep-debug the skeleton units into a separate debug file with a debuglink. That debug file could then be placed with the DWP or even combined with the DWP (since the DWP sections don't overlap with the "regular" debug sections) allowing you to achieve a binary with no debug information and potentially even a single debug file. It's merely a question of whether the tooling does or can be configured to look in the right places for the information.

mstange commented 1 year ago

Folding the DWP sections into the debug file is an excellent idea and would solve all the problems I listed.

davidtwco commented 1 year ago

I don't have strong feelings on whether this would be a better default for packed mode or not - however, I'm not sure this is something we can change now that Split DWARF is available on stable.

khuey commented 1 year ago

I tried using split-debuginfo=packed and it doesn't seem to actually split all of the debug info. See #106592.

jyn514 commented 1 year ago

I'm not sure this is something we can change now that Split DWARF is available on stable.

I don't see why this would be true? We stabilized the existence of the option, but not how it's implemented.

bjorn3 commented 1 year ago

For -Csplit-debuginfo=unpacked the actual implementation is not specified, but for -Csplit-debuging=packed I think the intention was to specify it. For example cargo needs to know how the split debuginfo format works (file extension, file or directory, ...) to be able to uplift the split debuginfo files.

davidtwco commented 1 year ago

I don't see why this would be true? We stabilized the existence of the option, but not how it's implemented.

I'm not sure what the consensus is here, I don't have strong feelings personally, but I find it a little surprising that we would guarantee "this option will pack debuginfo" and no more - that you couldn't even rely on the file it creates continuing to be created at all?

jyn514 commented 1 year ago

that you couldn't even rely on the file it creates continuing to be created at all?

That seems fine? I don't think anyone should be hard-coding filenames personally, if you need to be resilient across multiple versions you should use --message-format json which tells you the path without having to guess.

bjorn3 commented 1 year ago

cargo --message-format json gets that information through a combination of hardcoded information and the output of rustc --print file-names. rustc --print file-names doesn't list the name of debuginfo files, so cargo has to hard code it. Changing both rustc and cargo together is not an option. Cargo has to work with newer rustc and rustc up to 2 releases older (stable rustc in nightly cargo). Furthermore for build systems that don't use cargo --message-format json doesn't solve anything.

jyn514 commented 1 year ago

rustc --print file-names doesn't list the name of debuginfo files, so cargo has to hard code it.

That seems easy to fix? We can add it to the list of names rustc prints.

Cargo has to work with newer rustc and rustc up to 2 releases older (stable rustc in nightly cargo).

Why?

bjorn3 commented 1 year ago

That seems easy to fix? We can add it to the list of names rustc prints.

That will break parsing of the output. The expected format is that for each --crate-type argument it shows exactly one filename, which is the main linked artifact. It doesn't show .o for --emit obj or .rmeta for --emit metadata either. In addition it can be combined with other --print and in fact cargo does so. For example echo | rustc --print sysroot --print file-names --crate-name ____ - --crate-type bin --crate-type lib --emit obj,metadata shows:

/home/gh-bjorn3/.rustup/toolchains/nightly-2023-02-06-aarch64-unknown-linux-gnu
____
lib____.rlib

Why?

For working with older rustc as I understand it, this is because development of cargo happens using stable rustc rather than the associated nightly. For working with newer rustc, this is because there is resistance to allowing stable cargo to depend on nightly rustc features afaik.

khuey commented 1 year ago

Realistically DWPs were practically unusable before rust-lang/cargo#11572 so I don't think it's unreasonable to assume the meaning of -Csplit-debuginfo=packed could be changed if it's done soon.

That said, if it's desirable for the behavior in the opening comment to be built into the toolchain (and I'm not convinced that it is) I don't see any reason it can't be done under a new name (i.e. -Csplit-debuginfo=bikeshed here).

Gankra commented 1 year ago

Because the primary mechanism for choosing your split-debuginfo mode is seemingly a Cargo profile, it's a tad suboptimal for the "right choice" to have different names.

philipc commented 1 year ago

I don't see the value of a generic name like "packed". I can see the value of using "unpacked" to mean that you want the fastest possible compile/link on any platform, but the reason to choose "packed" is because you then want to do something with the resulting packed file, and so you must already have platform specific knowledge of what that packed file is, so why not specify that unambiguously. The only reason I can see to use a generic name like "packed" is because Cargo doesn't let you specify this option per platform, but the correct fix for that is to remove that limitation of Cargo. There's no fundamental reason why using "packed" for one platform means you should be using "packed" for all platforms. This has bitten me when I have wanted them to be different, see the hack in https://github.com/gimli-rs/addr2line/pull/258.

Gankra commented 1 year ago

I agree, the UX is a real mess right now. I've been trying to work out "good shippable build settings" to pick for users in cargo-dist and one of my big goals was to help people produce proper debuginfo files. Ideally I could pick some good defaults, write it to the Cargo.toml, and let the user tweak them "normally" if they disagree with my choices...

But as of this morning I've run into enough messed up issues with the debuginfo flags that I think I need to just shelve that stuff until further notice. Like I could potentially do a bunch of really custom env-var stuff like the above gimli hack, but that feels... wrong? Ideally a "good build configuration" should just be something you can put in your Cargo.toml.

(Unfortunately this is already untrue because of rfc 1721 (crt-static), so there's a lot of work that needs to happen for that to be a viable goal.)

bjorn3 commented 8 months ago

In https://github.com/rust-lang/compiler-team/issues/721 it has been approved to put objcopy based split debuginfo under a new -Csplit-debuginfo=post-link flag. This is not implemented yet.

rust-lang / rust

Are dwp files the right choice for split-debuginfo=packed on Linux? #105991