Open lunixbochs opened 7 years ago
If you target ELF format, I think you will be interested in:
Moreover this are the tests:
Looking at insert_content
, I don't like that it relies on sections.
I may look into using it for initial PE/MachO support, but I think my ELF emitter is more powerful at this point.
I agree I plan to remove this dependency.
I may look into using it for initial PE/MachO support, but I think my ELF emitter is more powerful at this point.
Even for libraries ?
My capability test right now is to link a dynamic binary into an existing executable (which exercises all features required for runtime linking). Patchkit can almost do it, but it doesn't look like LIEF is close yet. I do still need to add GOT/PLT support, so LIEF is ahead there, but that shouldn't be a huge task.
For example, parsing and re-emitting PT_DYNAMIC segment:
https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L953 https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L1237
Thanks for the examples I will look at it.
Otherwise, my general ELF requirements (before trying to muck with dynamic linking) are as follows (with checkboxes by where I think LIEF is equivalent):
And as of my dynamic branch:
I'm trying to do these with as high-level pythonic APIs as possible, so for example address space is Elf.read(addr, size)
, Elf.write(addr, data)
.
Sections/segments are exposed and iterable. Some examples of patchkit/elffile ELF functionality:
e = elffile.open('filename')
e.progs.append(e.PH())
e.progs.pop(0)
for ph in e.progs:
if ph.type == 'PT_LOAD': # (this is an enum type with __eq__ overridden to support strings)
ph.flags |= elffile.PF.X
# this will adjust memsz, filesz
ph.data += b'0' * 1000
for i in xrange(1000):
# no limits on program header table
e.progs.append(e.progs[0])
e.save('filename')
Actually
create a new ELF file from scratch
isn't (yet) possible with LIEFchange segment protection arbitrarily
is possible parse binaries without relying on sections
: yes modify: noParse symbols without relying on sections.
yes except for the number of symbols. (WIP)But I will take a deep look at your project :)
change segment protection arbitrarily
is possible
I amended this a bit.
Parse symbols without relying on sections. yes except for the number of symbols. (WIP)
You can derive from the elffile code for this and relicense to the Apache license if you like:
DYN parsing: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L987-L998 DT_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L577-L581 DT_GNU_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L611-L629
DYN parsing: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L987-L998 DT_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L577-L581 DT_GNU_HASH counting: https://github.com/lunixbochs/patchkit/blob/dyn/util/elffile.py#L611-L629
Yep: https://github.com/lief-project/LIEF/blob/master/src/ELF/Parser.tcc#L593
My next step for dynamic is parsing/emitting VERNEED stuff, but I've been procrastinating reading the GNU source for that. These formats need to all be a little better documented.
I confirm it's not trivial.
Also, I understand it's a tradeoff between having unified APIs and having good language-specific APIs, but I wish the LIEF Python API was a little more Pythonic.
Some examples:
LIEF:
# why isn't arg0 named path or filename?
lief.ELF.parse(arg0: str)
# open raw data (why a list of ints? where is name used?)
lief.ELF.parse_from_raw(raw: List[int], name: str='')
# very long function name
# what's arg0, arg1? I assume addr, size, but the docs aren't clear on that
# is there a particular reason to return List[int] instead of using the Python bytearray type?
ELF.Binary.get_content_from_virtual_address(self: lief.Binary, arg0: int, arg1: int) → List[int]
patchkit/elffile equivalents:
# admittedly, I should rename name to path and block to raw
# (I inherited this code from the elffile project)
elffile.open(name=str)
elffile.open(obj=fileobj)
elffile.open(block=raw)
elf.read(addr, size) -> bytearray()
Actually arg0: str
comes from Pybind11 default documentation but you are right I have to give an explicit name. Same for list
vs bytearray
and for long methods name. Thanks for the suggestion.
PS: As I have to deal with 3 formats/3 API it's a long task. (documentation vs features)
https://github.com/lief-project/LIEF
initially: I don't think it's powerful enough, adds a binary dependency, and adds C code to run on untrusted binaries.