m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

Elf: match imported functions to libraries? #363

Open rjzak opened 1 year ago

rjzak commented 1 year ago

It seems that the Elf parser has the functionality to get the libraries which the binary will get at runtime, via the elf.libraries Vec. After playing around with the library, it seems that the elf.dynstrtab field has the functions from the various imported libraries. But how do you figure out which library is imported for each function?

In Go, there's a function which returns an array of structs which have the function name and the library name, debug/elf/file.go:ImportedSymbols(). How could similar functionality be implemented using Goblin?

From trying to learn more about the Elf format, it seems that the elf.versym field might have the linking between the two, but I'm clearly missing something.

I also don't know if this is what was being discussed in https://github.com/m4b/goblin/issues/282, since that issue doesn't mention anything about function names, but maybe there is some structure which connects versions, functions, and libraries. Again, I'm still learning about the nitty-gritty of ELFs, and yesterday was the first time I tried anything with Goblin.

Ultimately, I'd like to have a way to do import hashing for malware analysis with Rust in my MalwareDB project, which I had done previously in Go.

m4b commented 1 year ago

Unlike e.g., PE or mach-o, elf binaries that "import" a symbol does not have an explicit mapping from symbol -> library providing it (the term for this in mach-o is "two-level namespaces" iirc). In other words the namespace for symbol lookup is "flat", and afaik, the dynamic linker will:

  1. iterate each dynamic library
  2. (usually) using a bloom filter, check if symbol in the binary
  3. if yes, return that symbol (the address basically), and in most cases fixup the PLT so that all future call's jump to this function
  4. else continue on to the next dynamic library that is listed as a dependency

consequently, if you have two libraries, libfoo.so and libbar.so and they both "export" a function named foo, and you link against both, and call foo, it's unclear which one you will get (usually it's link order, etc., but behavior will just depend on the dynamic linker).

versym might be (abused) to implement something like two level namespaces in elf format, but it's mostly for versioning the same symbol within a particular shared library.

it's possible to write such a function, but one would have to effectively implement symbol searching like a dynamic linker does. there isn't anything stopping the user from doing that, but it's probably outside the scope of goblin imho.

m4b commented 1 year ago

You could perhaps use the GnuHash (although i see it's only constructor is unsafe) to do fast symbol lookups with: https://github.com/m4b/goblin/blob/d7e8e29646157a8c0f9b04337f729be06273b33b/src/elf/gnu_hash.rs#L199 in order to find which library exports a symbol that another library imports. There might be a better way to do this, I'm not sure, would have to dig into it :)