m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

ELF Notes should use 4-byte alignment #364

Closed Swatinem closed 1 year ago

Swatinem commented 1 year ago

The ELF manpage is pretty clear on this using a 4 byte alignment:

https://www.mankier.com/5/elf#Description-Notes_(Nhdr)

Each note is followed by the name field (whose length is defined in n_namesz) and then by the descriptor field (whose length is defined in n_descsz) and whose starting address has a 4 byte alignment.

The example given there is also hardcoding a 4 byte alignment even though the example is using Elf64_Nhdr.

However, goblin uses the alignment as defined in the program headers to align the description after reading the name here:

https://github.com/m4b/goblin/blob/87821fd7b85cbcaa0ce9ae542809313b18dec2f2/src/elf/note.rs#L222

I am running into this problem with a NOTE that has a 16-byte name, thus 12-bytes for the Note header + 16-bytes for the name are not 8 byte aligned, so it erroneously shifts the beginning offset of the description by 4 bytes, which then leads to an error parsing parsing the description bytes.

philipc commented 1 year ago

I wouldn't rely solely on that man page. As noted in #90, other programs such as readelf sometimes use an alignment of 8, so I don't think changing to always use 4 is a correct solution.

Can you provide a file to test with, or steps to create such a file?

Swatinem commented 1 year ago

Maybe I was a bit overeager and creates a PR with a test right away: https://github.com/m4b/goblin/pull/365

Unfortunately, I can’t share the file I am looking at as it contains sensitive customer data.

philipc commented 1 year ago

Can you determine which compiler was used to create the file?

Swatinem commented 1 year ago

The file is not an executable, but a core file from a certain proprietary ELF-based platform that puts a bunch of custom info into various NOTEs. I’m trying to intentionally keep this as vague as possible :-D

philipc commented 1 year ago

A possibility is that the proprietary ELF-based platform is incompatible with other platforms.

I'll see if I can find a file that does require 8 byte alignment for notes.

Swatinem commented 1 year ago

In case this is indeed incompatible with other ELF platforms, I can also parse the NOTEs manually instead of going through iter_note_headers, that is perfectly fine for my use cases :-)

Swatinem commented 1 year ago

Ah okay, I found out the confusion here.

Both LLVM and binutils align only namesz, and not the whole pointer offset:

https://github.com/llvm/llvm-project/blob/2f2c76ddb73e6176bed558c7d61faa50aabb9e9e/llvm/include/llvm/Object/ELFTypes.h#L643

https://github.com/bminor/binutils-gdb/blob/1ed7ccc6cf46f1ec60538875cf0f68890f5ae6ec/binutils/readelf.c#L16177

Whereas goblin aligns the whole pointer.

in my case namesz == 16, so it is already 8-byte aligned, and no change is being made in readelf. But goblin sees the 12-byte NOTE header + 16 byte namesz and aligns that. (Or rather, the file-based offset of that.) That is where my 4-byte offset is coming from presumably.

Swatinem commented 1 year ago

I’m still so confused how this whole thing could potentially have ever worked reading theNT_GNU_BUILD_ID, which has a 4-byte namesz.

Swatinem commented 1 year ago

Okay no, wait a second, that alignment is taking into account the offsetof:

https://github.com/bminor/binutils-gdb/blob/1ed7ccc6cf46f1ec60538875cf0f68890f5ae6ec/include/elf/external.h#L193-L194

So it does align taking into account the whole struct, which is why GNU\0 aligns correctly.

For my use-case, readelf seems to align and read things correctly, but goblin does not. Maybe it is simply using a different alignment value for some reason, even though the program headers state that the NOTEs are supposed to be 8-byte aligned.

philipc commented 1 year ago

The only files I've found on my system that have alignment 8 are the ".note.gnu.property" sections, which happen to be readable using an alignment of both 4 and 8.

For the test you added in #365, have you kept the same length for the name and descriptor?

If I write the data from that test to a file with section alignment 8, then readelf fails to read it:

$ readelf -n note8.o

Displaying notes found in: .note
  Owner                Data size    Description
readelf: Warning: note with invalid namesz and/or descsz found at offset 0x0
readelf: Warning:  type: 0x1, namesize: 0x00000008, descsize: 0x00000014, alignment: 8

llvm-readobj does succeed because it always uses alignment 4. In my opinion this is an LLVM bug, and it has had similar bugs before (https://reviews.llvm.org/D70962).

The ELF generic ABI states that padding is 4 or 8 depending on ELF class: https://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section. OS and CPU ABIs can override that, but in any case I would expect the section alignment to match the alignment that is used.

Swatinem commented 1 year ago

Thanks for helping me deep dive into all this. I was indeed using llvm-readobj / llvm-readelf as a reference, which was working just fine. It looks like the core dump that I am dealing with might just advertise a wrong alignment. But I should be able to work around that on my end. Thanks for all the help!