Open m4b opened 6 years ago
I haven't used goblin::Object
(or if I have I've forgotten where).
So far I've only used goblin with memmap input, and I think the existing Object::parse
would work for that.
The proposed parse_and_fill
makes sense to me. Why would it be a breaking change to add it? I tend to prefer ordering the return value as Result<Option<Object>>
though.
It could be a breaking change because i want to remove the Unknown variant since the peek ostensibly makes it redundant, or forces an unreachable clause (which is a sign of bad design to me)
What if peek
returned an Option<Hint>
?
is_lsb
little endian?let goblin_loader = goblin::load(&mut fd);
let endianness = goblin_loader.peek().endian(); // Option<goblin::Endian>
let architecture = goblin_loader.peek().architecture(); // Option<goblin::Architecture>
let elf = if goblin_loader.is_elf() {
goblin_loader.elf() // elf = Option<goblin::elf::Elf>
}
else {
panic!("Not an elf")
}
let elf = match goblin_loader.executable() {
match goblin::loader::Elf(elf) => elf, // goblin::elf::Elf
_ => panic!("Not an elf")
};
Not exactly, but you kind of get the idea.
So to be clear the pattern I don't like is this:
if let Hint::Unknown(magic) = peek {
println!("unknown magic: {:#x}", magic)
} else {
let bytes = { let mut v = Vec::new(); fd.read_to_end(&mut v)?; v };
let object = Object::parse(&bytes)?;
match object {
Object::Elf(elf) => {
println!("elf: {:#?}", &elf);
},
Object::PE(pe) => {
println!("pe: {:#?}", &pe);
},
Object::Mach(mach) => {
println!("mach: {:#?}", &mach);
},
Object::Archive(archive) => {
println!("archive: {:#?}", &archive);
},
Object::Unknown(magic) => { unreachable!() }
}
That last unreachable
in Object::Unknown
is, to use an unpalatable term, "has code smellies" to me ;)
I actually don't think there is way to fix this, short of adding dependent types to rust :thinking:
So there's a couple of problems:
fn parse<R: Read>(r: R) -> Object
(it actually used to be like this prior to move to zero-copy)To be precisely clear, I wanted to remove the Object::Unknown
variant, but I don't think this is possible if we have an api like fn parse(bytes: &[u8]) -> Object
since the bytes could have a bad magic.
Alternatively, we could return a Error::BadMagic
variant when the magic doesn't match our list of acceptable magicks. But this would then require the client to examine the error for the specific BadMagic variant, instead of the Unknown variant, which I don't like, as it seems to "overload" the error type in some sense.
Of course this could be solved by also returning a new Object variant without the unknown case, and only provided by a function like fn peek_and_parse
which returns the special variant iff the peek does not fail, and a None otherwise, e.g. something like:
pub enum DefinitelyABinary {
Mach(mach),
Elf(elf),
PE(pe),
}
pub fn peek_and_parse<R: Read>(fd: R, bytes: &mut Vec<u8>) -> Result<Option<DefinitelyABinary>> {
match peek(fd) {
Unknown(_) => None // we lose magic...
Peek::Elf(_) => fd.read_to_end(bytes); DefinitelyABinary(elf::Elf::parse(&bytes)
Peek::Mach(_) => // same but parse mach
Peek::PE(_) => // same but parse pe
}
}
but this isn't very satisfying to me either. Maybe it should be. I dunno?
I'm not sure there's a perfect solution here :/
EDIT
To preserve the magic information, we probably want a signature like:
type UnknownMagic = usize;
Result<Result<DefinitelyABinary, UnknownMagic>, goblin::Error>
But the double result will very likely just get ignored via ? Maybe that's ok?
let object = Object::peek_and_parse(fd, &mut bytes)??;
It looks we're angrily confused about the goblin library and this function call :rofl:
How about this?
let peek = goblin::peek(&mut fd)?;
match peek {
None => println!("unknown magic: {:#x}", magic),
Some(magic) => {
let bytes = { let mut v = Vec::new(); fd.read_to_end(&mut v)?; v };
let object = Object::parse(magic, &bytes)?;
match object {
Object::Elf(elf) => {
println!("elf: {:#?}", &elf);
},
Object::PE(pe) => {
println!("pe: {:#?}", &pe);
},
Object::Mach(mach) => {
println!("mach: {:#?}", &mach);
},
Object::Archive(archive) => {
println!("archive: {:#?}", &archive);
},
}
}
}
goblin::parse
would determine the filetype from the magic parameter.
What if you pass an invalid magic? It will have either error with bad magic (same problem I mentioned above) or return variant of Unknown, or still have signature like Result<Option<Object>>
; which at that point may as well push all the fd + peek logic into a single function
I'm suggesting peek return an Option
fn peek(fd: &mut File) -> Result<Option<Hint>>;
fn parse(hint: Hint, bytes: &[u8]) -> Result<Object>;
Right, but what would the internal logic of parse(magic: usize, bytes: &[u8])
look like?
It has to still check the magic is valid - either it will return Option, Result with an Unknown variant, or panic, no?
Oh, I see what you're saying, I thought the peek was a usize:
fn parse(hint: Hint, bytes: &[u8]) -> Result<Object> {
match hint {
Elf => // parse elf,
Mach => // parse mach,
PE => // parse pe,
Archive => // parse pe,
// exhausted
}
}
Yea I like that idea; it makes using goblin from raw bytes slightly more complicated though.
Using your proposal I think I like this api:
fn peek(fd: &mut File) -> Result<Option<Hint>>;
fn parse_with(hint: Hint, bytes: &[u8]) -> Result<Object>;
fn parse(fd: &mut File, bytes: &mut [u8]) -> Result<Option<Object>>;
For clients who already have the bytes but no fd
it sort of makes goblin a pita to use though. I'm not sure how common this would be though. probably could be very common.
One thought in reply to this:
if we peek, the if condition guarantees us that the Unknown variant cannot occur, hence its unreachable. But this is runtime information (hence the semi flippant dependent types remark)
There is no such guarantee. Files can change between reads, and mmap
'd files can even change mid-read. Or, plain old user error: the user can pass one buffer into peek()
and another into parse()
.
Even if you peek()
successfully, parse()
has to handle the invalid magic case.
@willglynn yea that's a good point.
However @sunfishcode 's option proposal I think somewhat sidesteps this issue.
It forces the user to pass a peek. They either got the peek from somewhere or fabricated it.
In either case, the parse will error; it'll interpret the bad bytes as an elf binary and will still return a scroll BadMagic error when attempting to parse the elf header.
The problem of the file changing, mmap bytes getting messed up is still a problem for current lib - it will just return BadMagic for elf if the header doesn't match and you try to parse it.
The advantage is that making peek an option is that it removes the unreachable case, or at least, moves it into the separate binary parsing backend errors.
@m4b Sure. I just wanted to tamp down expectations. A peek()
that finds a valid magic makes it highly likely that parse()
will see the same thing, it just doesn't guarantee anything. parse()
's body and its API needs to accommodate that.
@willglynn yea i probably overstated things when I said it's guaranteed :laughing:
As a bonus I think I can also have my angry double ??
version using the option peek as well:
// or perhaps vexing_parse for our c++ brethren
pub fn angry_parse(bytes: &[u8]) -> Result<Result<Object, goblin::Error>, UnknownMagic> {
match Hint::peek(bytes) {
Some(hint) => // parse accordingly
None => Err(UnknownMagic::from(bytes))
}
}
I think my last suggestion is actually a good idea, or dan's; I think pre-release 1.0 one of them should be chosen.
This deeply annoys me:
I think there's an architectural problem and an ergonomics problem here.
Unknown
variant - we already peeked to make sure the magic is good! - everything else is just a parse error for that respective file formatSo, what I want is my cake and eat it too:
I want the peek to ensure the magic is correct, and route to the correct binary parser, and return this result, without the
Unknown
variant, without temporarily allocations (or the full fd read is passed through).@philipc @endeav0r you seem to be using
goblin::Object
as clients, does this bother you?Anyone who happens to be watching/reading this, I'm open to proposals how to fix this make it nicer.
Afaics, being flexible w.r.t. the bytes + reading is going to be tricky; first thing that comes to my mind is some kind of closure style or an inout, like:
This would be a breaking change, but I think its important to get right sooner rather than later