evaluate LIEF - Githubissues

lunixbochs commented 7 years ago

https://github.com/lief-project/LIEF

initially: I don't think it's powerful enough, adds a binary dependency, and adds C code to run on untrusted binaries.

romainthomas commented 7 years ago

If you target ELF format, I think you will be interested in:

add_section
add_segment
insert_content // for injection in libraries

Moreover this are the tests:

lunixbochs commented 7 years ago

Looking at insert_content, I don't like that it relies on sections.

I may look into using it for initial PE/MachO support, but I think my ELF emitter is more powerful at this point.

romainthomas commented 7 years ago

I agree I plan to remove this dependency.

romainthomas commented 7 years ago

I may look into using it for initial PE/MachO support, but I think my ELF emitter is more powerful at this point.

Even for libraries ?

lunixbochs commented 7 years ago

My capability test right now is to link a dynamic binary into an existing executable (which exercises all features required for runtime linking). Patchkit can almost do it, but it doesn't look like LIEF is close yet. I do still need to add GOT/PLT support, so LIEF is ahead there, but that shouldn't be a huge task.

For example, parsing and re-emitting PT_DYNAMIC segment:

https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L953 https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L1237

romainthomas commented 7 years ago

Thanks for the examples I will look at it.

lunixbochs commented 7 years ago

Otherwise, my general ELF requirements (before trying to muck with dynamic linking) are as follows (with checkboxes by where I think LIEF is equivalent):

[x] read/write virtual address space
[x] inject a new segment
[ ] create a new ELF file from scratch
[ ] extend/shrink/arbitrarily edit a segment
[ ] change any segment attributes arbitrarily (in patchkit/elffile, the serializable object is exposed and anything can be edited).
[ ] delete/rearrange sections/segments
[ ] parse/modify binaries without relying on sections
[ ] don't crash or throw exceptions in any case where the OS loader/linker would work, and even for some malformed binaries where it might fail

And as of my dynamic branch:

[ ] Parse symbols without relying on sections.
[ ] Expose all PT_DYNAMIC data as modifiable attributes (like INIT, RPATH, symtab, etc)

I'm trying to do these with as high-level pythonic APIs as possible, so for example address space is Elf.read(addr, size), Elf.write(addr, data).

Sections/segments are exposed and iterable. Some examples of patchkit/elffile ELF functionality:

e = elffile.open('filename')
e.progs.append(e.PH())
e.progs.pop(0)

for ph in e.progs:
    if ph.type == 'PT_LOAD': # (this is an enum type with __eq__ overridden to support strings)
        ph.flags |= elffile.PF.X
    # this will adjust memsz, filesz
    ph.data += b'0' * 1000

for i in xrange(1000):
    # no limits on program header table
    e.progs.append(e.progs[0])

e.save('filename')

romainthomas commented 7 years ago

Actually

create a new ELF file from scratch isn't (yet) possible with LIEF
change segment protection arbitrarily is possible
parse binaries without relying on sections: yes modify: no
Parse symbols without relying on sections. yes except for the number of symbols. (WIP)

But I will take a deep look at your project :)

lunixbochs commented 7 years ago

change segment protection arbitrarily is possible

I amended this a bit.

Parse symbols without relying on sections. yes except for the number of symbols. (WIP)

You can derive from the elffile code for this and relicense to the Apache license if you like:

DYN parsing: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L987-L998 DT_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L577-L581 DT_GNU_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L611-L629

romainthomas commented 7 years ago

DYN parsing: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L987-L998 DT_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L577-L581 DT_GNU_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L611-L629

Yep: https://github.com/lief-project/LIEF/blob/master/src/ELF/Parser.tcc#L593

lunixbochs commented 7 years ago

My next step for dynamic is parsing/emitting VERNEED stuff, but I've been procrastinating reading the GNU source for that. These formats need to all be a little better documented.

romainthomas commented 7 years ago

I confirm it's not trivial.

lunixbochs commented 7 years ago

Also, I understand it's a tradeoff between having unified APIs and having good language-specific APIs, but I wish the LIEF Python API was a little more Pythonic.

Some examples:

LIEF:

# why isn't arg0 named path or filename?
lief.ELF.parse(arg0: str)
# open raw data (why a list of ints? where is name used?)
lief.ELF.parse_from_raw(raw: List[int], name: str='')

# very long function name
# what's arg0, arg1? I assume addr, size, but the docs aren't clear on that
# is there a particular reason to return List[int] instead of using the Python bytearray type?
ELF.Binary.get_content_from_virtual_address(self: lief.Binary, arg0: int, arg1: int) → List[int]

patchkit/elffile equivalents:

# admittedly, I should rename name to path and block to raw
# (I inherited this code from the elffile project)
elffile.open(name=str)
elffile.open(obj=fileobj)
elffile.open(block=raw)

elf.read(addr, size) -> bytearray()

romainthomas commented 7 years ago

Actually arg0: str comes from Pybind11 default documentation but you are right I have to give an explicit name. Same for list vs bytearray and for long methods name. Thanks for the suggestion.

PS: As I have to deal with 3 formats/3 API it's a long task. (documentation vs features)

lunixbochs / patchkit

evaluate LIEF #15