Closed tux3 closed 2 years ago
I think there's two bugs:
e_shstrndx == SHN_XINDEX
, the index of the section header string table should be obtained from the sh_link
field of the section header at index 0I suspect that for the file you are parsing, the first bug is only occurring as a result of the second bug. That is, we're using the wrong section index (65535) for the string table, and as a result we're trying to parse the contents of a .group
section as UTF-8 strings. So if you fix the second bug, the UTF-8 parsing will no longer be a problem for this file (but it might be a problem for other files).
The code that needs fixing: https://github.com/m4b/goblin/blob/3f5f70e0e68243559f6449bd9ad3517be2c206d0/src/elf/mod.rs#L291-L292
There's possibly other code that needs fixing to handle large numbers of sections too (e.g. e_shnum
and st_shndx
can overflow).
Here's an example of better e_shstrndx
parsing: https://github.com/gimli-rs/object/blob/c4760714aa9ca6f73cd5e76991463ed1e3497589/src/read/elf/file.rs#L582-L600
Starting with goblin 0.4.2,
Strtab::parse
does the following:However, it appears that the contents of the strtab in a valid ELF files are NOT always valid UTF-8.
Some of the strtab entries in my ELF object look like
[F4, 65, 02, 00]
, or[82, 66, 02, 00]
. This causesget_str
to fail and the ELF fail cannot be parsed.The ELF file in question was created by GNU ld, it's a relocatable object.
Here is the `readelf -h` output for the object. Note the number of section headers.
``` ELF Header: Magic: 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - GNU ABI Version: 0 Type: REL (Relocatable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 600695752 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 64 (bytes) Number of section headers: 0 (274069) Section header string table index: 65535 (274068) ```The first strtab entry (F4, 65, 02) that goblin fails to parse is at this offset into the file:
000bdc20: f465 0200 81d4 0300 54f5 0200 db28 0400 .e......T....(..
This corresponds to the following offset starting at number 65535 in the
readelf --sections
output:readelf -s
shows the following:I don't know this particular corner of the ELF spec very well at all, but I believe something special must be happening when there are more than 2^15 section headers, and the strtab contents may not always be valid UTF-8 strings.
Sometimes this bug fails to reproduce, even though the number of section headers is far above 65k. I believe this is because the 3 bytes of binary data in the strtab entries can accidentally happens to be valid UTF-8. This may have been why I didn't run into this bug before today.
I can upload the object file if that would help (though it is from a large statically linked binary compiled in debug and 590M big).