shellphish / driller

Driller: augmenting AFL with symbolic execution!
BSD 2-Clause "Simplified" License
889 stars 162 forks source link

Out of paths #24

Closed anon8675309 closed 6 years ago

anon8675309 commented 7 years ago

I'm not sure if this issue belongs here or in the angr repo, but here's the error I'm getting:

Traceback (most recent call last):
  File "./demo.py", line 17, in <module>
    solutions = d.drill()
  File "/home/adam/.virtualenvs/driller/local/lib/python2.7/site-packages/driller/driller.py", line 110, in drill
    list(self._drill_input())
  File "/home/adam/.virtualenvs/driller/local/lib/python2.7/site-packages/driller/driller.py", line 138, in _drill_input
    self._set_concretizations(t)
  File "/home/adam/.virtualenvs/driller/local/lib/python2.7/site-packages/driller/driller.py", line 244, in _set_concretizations
    state = t.path_group.one_active.state
  File "build/bdist.linux-x86_64/egg/angr/path_group.py", line 381, in __getattr__
    return self.stashes[k[4:]][0]
IndexError: list index out of range

Here's the demo program that I'm using to create this error:

#!/usr/bin/env python
from driller.driller import Driller
from logging import DEBUG, getLogger
from os import environ
from traceback import print_exc

getLogger("tracer.Tracer").setLevel(DEBUG)
getLogger("angr.path_group").setLevel(DEBUG)

target = "/home/grimm/targets/xz-5.2.3/src/xz/.libs/xz"
afl_input_filename = "a.xz"
with open(afl_input_filename, "rb") as f:
        afl_input = f.read()
environ['LD_PRELOAD'] = "/home/grimm/targets/xz-5.2.3/src/liblzma/.libs/liblzma.so.5"
d = Driller(target, afl_input)
try:
        solutions = d.drill()
except Exception as e:
        print_exc()

The debugging output from path_group is extremely helpful in telling us that there are no active paths. This explains why we get the unhandled exception.

<snip>
DEBUG   | 2017-03-20 16:56:43,073 | angr.path_group | Round 72: stepping <PathGroup with 39 unsat, 1 active, 2 errored>
DEBUG   | 2017-03-20 16:56:43,074 | angr.path_group | Out of paths in stash active
<Traceback>

It seems path_group's getattr assumes that if the caller is accessing one_active, then self.stashes['active'] must have a length > 0.

I feel like the "right" thing to do in getattr is to return None in this case, but I'm not familiar enough with the code base to know if this is a reasonable way to handle this. More importantly, I don't understand why the path exploration can't find its way to the entry point! Presumably it's related to the LD_PRELOAD, but I'm not sure how. Whatever ingests the executable should be getting a snapshot of memory with the library loaded...

You guys have been giving me great support. Any pointers on this one?

zardus commented 7 years ago

The core problem here is that the state didn't get created properly. Probably on the 72nd step, some error occurred that killed it. Whether or not PathGroup behaves elegantly when trying to access it is orthogonal to the issue :-)

Could you check the path group and see if there are any paths in the errored stash? If there are, could you get the backtrace with a path.retry() or path.debug() and post it here?

anon8675309 commented 7 years ago

All three path groups have the same error:

t.path_group.stashes["errored"][2].retry()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/angr/path.py", line 769, in retry
  File "build/bdist.linux-x86_64/egg/angr/factory.py", line 77, in successors
  File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py", line 101, in process
    opt_level=opt_level)
  File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/engine.py", line 44, in process
    self._process(new_state, successors, *args, **kwargs)
  File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py", line 125, in _process
    opt_level=opt_level)
  File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py", line 453, in lift
    raise SimEngineError("No bytes in memory for block starting at %#x." % addr)
simuvex.s_errors.SimEngineError: No bytes in memory for block starting at 0x0.

When I tried running .debug() I found that buff was an empty string, ergo the error:

t.path_group.stashes["errored"][0].debug()
> /home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py(453)lift()
    452         if not buff or size == 0:
--> 453             raise SimEngineError("No bytes in memory for block starting at %#x." % addr)
    454 

ipdb> buff
''
ipdb>

How can I get a stacktrace of the binary (as opposed to a stacktrace in Python)? I poked around in the "state" and "state.gdb" variables after running .debug() but I didn't see anything obvious there.

rhelmot commented 7 years ago

path.call_stack (?) is the call stack of the emulated program.

On Tue, Mar 21, 2017 at 9:45 AM anon8675309 notifications@github.com wrote:

All three path groups have the same error:

t.path_group.stashes["errored"][2].retry() Traceback (most recent call last): File "", line 1, in File "build/bdist.linux-x86_64/egg/angr/path.py", line 769, in retry File "build/bdist.linux-x86_64/egg/angr/factory.py", line 77, in successors File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py", line 101, in process opt_level=opt_level) File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/engine.py", line 44, in process self._process(new_state, successors, *args, **kwargs) File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py", line 125, in _process opt_level=opt_level) File "/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py", line 453, in lift raise SimEngineError("No bytes in memory for block starting at %#x." % addr) simuvex.s_errors.SimEngineError: No bytes in memory for block starting at 0x0.

When I tried running .debug() I found that buff was an empty string, ergo the error:

t.path_group.stashes["errored"][0].debug()

/home/adam/.virtualenvs/driller/lib/python2.7/site-packages/simuvex-6.7.1.31-py2.7.egg/simuvex/engines/vex/engine.py(453)lift() 452 if not buff or size == 0: --> 453 raise SimEngineError("No bytes in memory for block starting at %#x." % addr) 454

ipdb> buff '' ipdb>

How can I get a stacktrace of the binary (as opposed to a stacktrace in Python)? I poked around in the "state" and "state.gdb" variables after running .debug() but I didn't see anything obvious there.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/shellphish/driller/issues/24#issuecomment-288141100, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYg9RYzsyZ9j5M7gKTiOxZ49anIWJQGks5rn_6UgaJpZM4MjBvX .

salls commented 7 years ago

In cases like this I usually look at the path.addr_trace to see which blocks it last executed

anon8675309 commented 7 years ago

Apparently ErroredPath objects don't have call_stack, but addr_trace exists. There's a slight problem with the addr_trace though... none of the addresses are mapped to anything when I debug in gdb.

>>> t.path_group.stashes["errored"][2].call_stack
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'ErroredPath' object has no attribute 'call_stack'
>>> type(t.path_group.stashes["errored"][2])
<class 'angr.path.ErroredPath'>
>>> print("\n".join(["0x%x" % x for x in t.path_group.stashes["errored"][2].addr_trace.hardcopy]))
0x7000250
0x1001fd8
0x1001fed
0x7000290
0x1002180
0x10021b0
0x1002100
0x1002128
0x7000290
0x2005580
0x2005d00
0x80006d0
0x2005d22
0x8000760
0x2005d8a
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e05
0x2005e08
0x20113a0
0x20113b3
0x20115c9
0x2011479
0x8000068
0x201148e
0x201149d
0x2011579
0x2005e21
0x20113a0
0x20113b3
0x20115c9
0x2011479
0x8000068
0x201148e
0x201149d
0x2011579
0x2005e47
0x8000070
0x2005e6d
0x2005970
0x5013370
0x2005e7e
0x2005e8d
0x20055d0
0x30fbd70
0x8000308
0x30fbd77
0x30fbd7f
0x2005ed1
0x2005ed5
0x2005ee0
0x2005efd
0x2005f5e
0x2005f78
0x2005f94
0x2005a10
0x3143170
0x2005f99

I'd like to poke around at memory and see instructions at these addresses, but I'm not sure how to do that. I did, however find that t.path_group.stashes["errored"][2].state.memory.load(0x7000250) was zero, so that seems to match up with the null pointer.

I see rhelmot responded to issue #25 with a bunch of debugging tips, so I'm going to switch back to taht and get familiar with debugging and then come back to this one once I am more proficient at debugging. As always, thanks for the tips. I'll be sure to submit pull requests if I'm able to fix anything!

rhelmot commented 7 years ago

It looks like it's actually callstack, not call_stack. :) The (?) was an indication that I wasn't sure and you should have used tab-autocomplete to find the actual value!

rhelmot commented 7 years ago

Also, for the formatting list comprehension you posted, you may be intrigued to learn about the following two things:

anon8675309 commented 7 years ago

Yeah I don't use that fancy ipython shell, I use the original which doesn't have tab-completion (but help() and dir() work just fine most the time... except when I overlook things...). At any rate, the .callstack object was most helpful. All three stacktraces are the same, end with a NPE, and contain entirely things which are unmapped when I debug in gdb (which, I presume, means they're angr internals).

print(t.path_group.stashes["errored"][2].callstack.dbg_repr())
0 | 0x2005f99 -> 0x0, returning to 0x2005fa9
1 | 0x2005580 -> 0x2005d00, returning to 0x2005589
2 | 0x7000290 -> 0x2005580, returning to 0x7000290
3 | None -> 0x7000250, returning to -0x1

Looking at the callstack was much more pleasant than looking through the addr_trace, but I could imagine cases where the latter would be what I want. The %#x was a new one for me, but I think that will have a solid place in my future. Unfortunately I need to switch back to another task which doesn't involve driller, but hopefully I'll be able to ret back here soon.

zardus commented 7 years ago

Unlike normal loaders (the behavior of which vary pretty widely anyways), angr maps its libraries at increments of 0x1000000. So 0x2000000 is the second library mapped, and so forth. That's why it doesn't line up with gdb, unfortunately.

anon8675309 commented 6 years ago

This issue was fixed with https://github.com/angr/angr/pull/558