m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.19k stars 158 forks source link

Is there any reason why no enums are provided for the Header fields? #276

Open CohenArthur opened 3 years ago

CohenArthur commented 3 years ago

I am trying to use goblin in a project to check the e_type of a binary file. I noticed that there are no type methods on the Header struct, but that the field was publicly accessible as an u16. I haven't been able to find any "abstractions" over this peculiar field, but noticed that some were available such as the container and endianness methods with associated enums. Is there any reason as to why more enums have not been introduced?

Something like a Type which could be #[repr(u8)] and contain the Dyn, Exec, None, etc etc. I know that the constants are available, but it would feel more "rusty" to match against an enum variant created from the underlying e_type field rather than matching against raw constants.

philipc commented 3 years ago

I didn't write the code, but here's my thoughts.

container and endianness are different because they don't directly correspond to header fields (they aren't #[repr()]), and they are concepts that aren't specific to ELF. They can be exhaustive enums because support for more variants is unlikely and necessarily requires code changes anyway.

As I see it, the advantage of using enums is that rust can use types to check that you are using a valid constant. However, using an enum is awkward because most of the enums wouldn't be exhaustive, so you need a variant such as Other(u8), which means

An alternative way to get type checking is to define the constants as newtypes instead. This would be a large breaking change though.

CohenArthur commented 3 years ago

They can be exhaustive enums because support for more variants is unlikely and necessarily requires code changes anyway. Isn't the same true for e_type? I don't think that many new variants will be added|


e_type This member of the structure identifies the object file type:
          ET_NONE         An unknown type.
          ET_REL          A relocatable file.
          ET_EXEC         An executable file.
          ET_DYN          A shared object.
          ET_CORE         A core file.

> As I see it, the advantage of using enums is that rust can use types to check that you are using a valid constant. However, using an enum is awkward because most of the enums wouldn't be exhaustive, so you need a variant such as `Other(u8)`, which means

That's true, but I am simply talking about having an abstraction on top of the header, not using it in the header structure. I also meant simply about those exhaustive enums such as, correct me if I'm wrong, `e_type`. I was simply thinking about something like.

```rust
enum Type {
    None,
    Rel,
    Exec,
    Dyn,
    Core,
}

and having an associated method:

pub fn type(&self) -> Result<Type> {
    ...
}

Just to be clear: I'd love to implement it and create a pull request, but am wondering if there was any necessary reasons for not including it already.

Also, I am faaaaar from an expert on ELF hahaha so feel free to correct me! Thanks a lot for taking the time to answer

philipc commented 3 years ago

Isn't the same true for e_type? I don't think that many new variants will be added

There's also:

/// OS-specific range start.
pub const ET_LOOS: u16 = 0xfe00;
/// OS-specific range end.
pub const ET_HIOS: u16 = 0xfeff;
/// Processor-specific range start.
pub const ET_LOPROC: u16 = 0xff00;
/// Processor-specific range end.
pub const ET_HIPROC: u16 = 0xffff;

I don't know if anything actually uses these. So maybe it is feasible for e_type. I'm not sure that there are many more fields like this though. For example, e_machine will get more variants, and e_flags depends on the machine. Segment types, section types and relocation types will all get new variants.

The xmas-elf crate defines both enums and newtypes, and includes the extra variant for the enums, e.g. Type. (I did use xmas-elf briefly and wasn't fond of the enums, but that's subjective. It was partly because they use 'rusty' variant names instead of the names from the constants, which makes it hard to correlate them. All of the literature uses the names of the constants.)

That's true, but I am simply talking about having an abstraction on top of the header, not using it in the header structure.

I see, yes I did misunderstand. What's your reason for suggesting #[repr(u8)]?

Just to be clear: I'd love to implement it and create a pull request, but am wondering if there was any necessary reasons for not including it already.

Probably need to wait for a reply from @m4b for that.

CohenArthur commented 3 years ago

I see, yes I did misunderstand. What's your reason for suggesting #[repr(u8)]?

Ah right, sorry, that wasn't very clear! I thought about using repr(u8) (or rather, u16) in order to not duplicate code:

#[repr(u16)]
enum Type {
    None = ET_NONE,
    Rel = ET_REL,
    Exec = ET_EXEC, // etc
    Dyn,
    Core,
}

// Or the other way around
#[repr(u16)]
enum Type {
    None = 0,
    Rel = 1,
    Exec = 2, // etc
    Dyn,
    Core,
}

pub const ET_NONE: u16 = Type::None as u16; // etc

But I had forgotten about OS specific types, so it does not apply anymore.

CohenArthur commented 3 years ago

In the end the enum would be exactly as you mention:

enum Type {
    None,
    Rel,
    Exec,
    Dyn,
    Core,
    OsSpecific(u16),
}

(And this cannot be made repr(u16), obviously).