rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.53k stars 12.61k forks source link

Undocumented use of DT_GNU_HASH descriptor #100859

Open jreiser opened 2 years ago

jreiser commented 2 years ago

Some executable Rust programs in ELF file format for Linux, often involving the musl low-level library, use undocumented "features" of the DT_GNU_HASH descriptor for run-time symbol information in an ELF file. The DT_GNU_HASH descriptor itself is undocumented by GNU binutils, so the "corner case" usage by Rust compounds the problem of deterioration of software environment by not agreeing or communicating intended meaning, and proliferation of inconsistent private interpretations. This GitHub Issue documents problematic occurrences that I have seen, and requests that Rust contribute to creating the documentation of DT_GNU_HASH, including explanation of "corner cases" seen so far.

The cases appeared when users of UPX (https://github.com/upx/upx) reported that UPX did not understand various Rust-compiled binary executable programs. UPX wants to understand compiled binary executables in order to compress and de-compress them, and a program's use of specific global symbols can affect UPX; so UPX must lookup those symbols, which are expected to be encoded using DT_HASH and/or DT_GNU_HASH descriptor. The specific DT_GNU_HASH descriptors used by Rust+musl have presented unusual encodings.

  1. https://github.com/upx/upx/issues/568 DT_GNU_HASH descriptor with no symbol information (0 == nbucket), in a dynamically-linked executable main program that does contain global symbols. How should this be interpreted? Is it an attempt to save space, and force a fall-back to linear search of DT_SYMTAB? If so, then why not save even more space by omitting DT_GNU_HASH entirely?
  2. https://github.com/upx/upx/issues/525 An earlier case of DT_GNU_HASH descriptor with no symbol information, but encoded as (1==n_bucket && 0==buckets[0] && 1==n_bitmask && 0==bitmask[0]). Save even more space by omitting DT_GNU_HASH entirely.
  3. https://github.com/upx/upx/issues/476 First-seen case of DT_GNU_HASH descriptor with no symbol information (0==nbucket), in a statically-linked main program with no symbols. Save even more space by omitting DT_GNU_HASH entirely.
  4. https://github.com/upx/upx/issues/369 hash_array[j] is not present if bucket[k] is zero for all k > j. So the tail of a complete hash_array[] (which should have nbucket entries, just like buckets[]) would overlap the following table (typically DT_SYMTAB).

The (0==nbuckets) cases suggest that the input to UPX might have been compromised by malware clobbering the first word of DT_GNU_HASH (and then using the remaining original space as payload for malware); else why was the trivially-shorter encoding of "no DT_GNU_HASH at all" not used? The last case (hash_array[] truncated by the start of DT_SYMTAB) also casts doubt on the integrity of the file. These optimizations for reducing space could well be blessed, if documented and explained. Please lend your weight to getting DT_GNU_HASH documented and annotated.

JMLX42 commented 4 hours ago

I have a similar error:

$ upx target/release/gltf_live_server
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2024
UPX 4.2.4       Markus Oberhumer, Laszlo Molnar & John Reiser    May 9th 2024

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
upx: target/release/gltf_live_server: CantPackException: bad DT_GNU_HASH n_bucket=0x1  n_bitmask=0x1  len=0x20  r=6

Packed 0 files.

In this case: n_bucket=0x1 n_bitmask=0x1 len=0x20 r=6.

But I am not using musl.

Update: I was using mold as the linker. Otherwise, upx works as expected.