m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.2k stars 160 forks source link

Add Arm extensions #242

Open willglynn opened 4 years ago

willglynn commented 4 years ago

There exist Arm extensions for ELF. This PR aims to add support to goblin.

goblin::elf::arm defines some contents and some extension traits which add Arm-specific functionality to existing goblin types. In particular, elf::Header::arm() returns an Option<ArmElfHeader> which (for Arm executables) can distinguish hard-float from soft-float, and distinguish Arm entrypoints from Thumb entrypoints. elf::Section::arm_special_section() returns an Option<ArmSpecialSection> which among other things can identify an ArmSpecialSection::BuildAttributes section.

goblin::elf::arm::build_attributes is the largest single part of this PR. It adds a bunch of types for parsing through the Arm build attributes hierarchy, mostly in (data type, fallible iterator, iterator error) sets. It also defines a particularly gory macro for use by build_attributes submodules. You see, Arm build attributes are organized into vendor-specific sections, and each vendor is free to define attributes as they see fit. Modeling this in Rust in a way that preserves goblin API stability is challenging.

The most interesting build attribute vendor is the Arm embedded ABI (aeabi) vendor, captured here in goblin::elf::arm::build_attributes::aeabi. Every Arm tool produces and consumes aeabi build attributes. The original spec describes attributes like:

Tag_WMMX_arch, (=11), uleb128
   0  The user did not permit this entity to use WMMX
   1  The user permitted this entity to use WMMX v1
   2  The user permitted this entity to use WMMX v2

That macro lets me write this as:

arm_build_attribute_vendor!(
/// A set of Arm embedded ABI properties.
Aeabi {
//…
    /// Are WMMX instructions permitted?
    [wmmx_arch, (=11), WmmxArch {
        /// The user did not permit this entity to use WMMX
        0 => NotPermitted,
        /// The user permitted this entity to use WMMX v1
        1 => V1,
        /// The user permitted this entity to use WMMX v2
        2 => V2,
    }]
//…

The macro expands to:

#[derive(Debug, Copy, Clone, Eq, PartialEq, Default)]
#[non_exhaustive]
pub struct Aeabi<'a> {
    pub cpu_raw_name: Option<&'a [u8]>,
    pub cpu_name: Option<&'a [u8]>,
    pub cpu_arch: Option<CpuArch>,
    pub cpu_arch_profile: Option<CpuArchProfile>,
    pub arm_isa_use: Option<ArmIsaUse>,
//…
    pub wmmx_arch: Option<WmmxArch>,
//…
}

//…

#[derive(Debug, Copy, Clone, Eq, PartialEq)]
#[non_exhaustive]
pub enum WmmxArch {
    NotPermitted,
    V1,
    V2,
    Unknown(UnknownValue),
}

//…

…along with all the boring, duplicative mapping and conversion code.

This all needs considerably better test coverage before it's ready to merge. I'm opening this as a draft PR for feedback and discussion.

willglynn commented 4 years ago

Checking binutils readelf, it appears that various other machine types use an identical build attributes scheme. I'll hoist goblin::elf::arm::build_attributes to goblin::elf::build_attributes.

I don't like the duplication and control flow inversion between ParameterType::{Ntbs,Uleb128} and Attribute::{Ntbs,Uleb128}. I'm pretty sure there's a trait to extract here which provides the attribute control over its parsing and its runtime representation, such that AttributesIter would call T::parse(tag_number) to produce a T and then package it into an Attribute<T> or something. Besides being generally better, this refactor would particularly benefit Arm's Tag_compatibility and Tag_also_compatible_with, which are both a Uleb128 followed by an Ntbs.

I'm also toying with the idea that the NUL-terminated byte string attributes should all be separate types. The argument is that while cpu_name might truly be a &[u8], it is a specific kind of &[u8]. Maybe it's better modeled as an arm::CpuName<'a> which is AsRef<[u8]> and From<&'a [u8]>/Into<&'a [u8]> and so on. This has benefits with localizing associated constants (e.g. CpuName could hold the list of CPU names) and other field-specific tricks. This might also tie into the trait I sense I'm missing: &'a [u8] is just a bag of bytes, whereas arm::CpuName<'a> could know that its own tag number is 5. If every field knew its own tag number, then perhaps Attribute wouldn't need to exist at all.

:thinking: Maybe: Remove Attribute and AttributesIter. Make all the attributes separate types which can parse themselves out of a &[u8]. Make Aeabi::parse(attrs: Attributes) own the loop which reads a tag number, dispatches to the appropriate attribute's parse function, and assigns the field. No weird separation/inversion of attribute parsing concerns. Hmm…

willglynn commented 4 years ago

That feels better. pub struct Attributes<'a> represents a bag of bytes containing zero or more unparsed attributes, and pub(crate) struct SemiParsedAttribute<'a> represents a parsed tag number and a bag of bytes containing both the attribute's data and subsequent unparsed attributes. Both types have fallible pub(crate) methods which consume the value and return its counterpart. This keeps the knowledge of build_attributes layout strictly in build_attributes while allowing the vendor section to drive the parsing.

The consuming code in the vendor section looks like:

pub struct Vendor {
  field: Option<Field>,
}

impl Vendor {
  fn parse(&mut self, mut attrs: Attributes) -> Result<(), AttributeParseError> {
    while let Some(attr) = attrs.next()? {
      match attr.tag_number() {
        1 => {
          let (value, rest) = Field::parse(attr)?;
          self.field = Some(value);
          attrs = rest;
        },
        _ => return Err(attr.unrecognized()),
      }
    }
    Ok(())
  }
}

pub enum Field {
  /* … */
}

impl Field {
  fn parse<'a>(attr: SemiParsedAttribute<'a>) -> Result<(Self, Attributes<'a>), AttributeParseError> {
    let (value, rest) = attr.parse_uleb128()?;
    Ok((value.into(), rest)
  }
}

…which the build_attributes! macro now generates.

m4b commented 4 years ago

Let me know when you’d like to me to review :)

m4b commented 3 years ago

@willglynn so this is marked a WIP, is this ready for review, or what should happen here? Interested in getting this merged/in :) Let me know whenever you have the time, thanks!

willglynn commented 3 years ago

Well, I'm not happy with the test coverage but I got sidetracked before improving them and haven't returned to it yet. It seems to work though.