lunixbochs / patchkit

binary patching from Python
Other
632 stars 85 forks source link

evaluate LIEF #15

Open lunixbochs opened 7 years ago

lunixbochs commented 7 years ago

https://github.com/lief-project/LIEF

initially: I don't think it's powerful enough, adds a binary dependency, and adds C code to run on untrusted binaries.

romainthomas commented 7 years ago

If you target ELF format, I think you will be interested in:

Moreover this are the tests:

lunixbochs commented 7 years ago

Looking at insert_content, I don't like that it relies on sections.

I may look into using it for initial PE/MachO support, but I think my ELF emitter is more powerful at this point.

romainthomas commented 7 years ago

I agree I plan to remove this dependency.

romainthomas commented 7 years ago

I may look into using it for initial PE/MachO support, but I think my ELF emitter is more powerful at this point.

Even for libraries ?

lunixbochs commented 7 years ago

My capability test right now is to link a dynamic binary into an existing executable (which exercises all features required for runtime linking). Patchkit can almost do it, but it doesn't look like LIEF is close yet. I do still need to add GOT/PLT support, so LIEF is ahead there, but that shouldn't be a huge task.

For example, parsing and re-emitting PT_DYNAMIC segment:

https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L953 https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L1237

romainthomas commented 7 years ago

Thanks for the examples I will look at it.

lunixbochs commented 7 years ago

Otherwise, my general ELF requirements (before trying to muck with dynamic linking) are as follows (with checkboxes by where I think LIEF is equivalent):

And as of my dynamic branch:

I'm trying to do these with as high-level pythonic APIs as possible, so for example address space is Elf.read(addr, size), Elf.write(addr, data).

Sections/segments are exposed and iterable. Some examples of patchkit/elffile ELF functionality:

e = elffile.open('filename')
e.progs.append(e.PH())
e.progs.pop(0)

for ph in e.progs:
    if ph.type == 'PT_LOAD': # (this is an enum type with __eq__ overridden to support strings)
        ph.flags |= elffile.PF.X
    # this will adjust memsz, filesz
    ph.data += b'0' * 1000

for i in xrange(1000):
    # no limits on program header table
    e.progs.append(e.progs[0])

e.save('filename')
romainthomas commented 7 years ago

Actually

But I will take a deep look at your project :)

lunixbochs commented 7 years ago

change segment protection arbitrarily is possible

I amended this a bit.

Parse symbols without relying on sections. yes except for the number of symbols. (WIP)

You can derive from the elffile code for this and relicense to the Apache license if you like:

DYN parsing: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L987-L998 DT_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L577-L581 DT_GNU_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L611-L629

romainthomas commented 7 years ago

DYN parsing: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L987-L998 DT_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L577-L581 DT_GNU_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L611-L629

Yep: https://github.com/lief-project/LIEF/blob/master/src/ELF/Parser.tcc#L593

lunixbochs commented 7 years ago

My next step for dynamic is parsing/emitting VERNEED stuff, but I've been procrastinating reading the GNU source for that. These formats need to all be a little better documented.

romainthomas commented 7 years ago

I confirm it's not trivial.

lunixbochs commented 7 years ago

Also, I understand it's a tradeoff between having unified APIs and having good language-specific APIs, but I wish the LIEF Python API was a little more Pythonic.

Some examples:

LIEF:

# why isn't arg0 named path or filename?
lief.ELF.parse(arg0: str)
# open raw data (why a list of ints? where is name used?)
lief.ELF.parse_from_raw(raw: List[int], name: str='')

# very long function name
# what's arg0, arg1? I assume addr, size, but the docs aren't clear on that
# is there a particular reason to return List[int] instead of using the Python bytearray type?
ELF.Binary.get_content_from_virtual_address(self: lief.Binary, arg0: int, arg1: int) → List[int]

patchkit/elffile equivalents:

# admittedly, I should rename name to path and block to raw
# (I inherited this code from the elffile project)
elffile.open(name=str)
elffile.open(obj=fileobj)
elffile.open(block=raw)

elf.read(addr, size) -> bytearray()
romainthomas commented 7 years ago

Actually arg0: str comes from Pybind11 default documentation but you are right I have to give an explicit name. Same for list vs bytearray and for long methods name. Thanks for the suggestion.

PS: As I have to deal with 3 formats/3 API it's a long task. (documentation vs features)