Open anon8675309 opened 7 years ago
Update: I tried including the patch to exclude angr internals from path exploration (https://github.com/shellphish/driller/pull/8) thinking it might avoid this 2000000 range stuff, but it did not make any difference.
Hey, so if you're gonna start using angr on real world programs you will run into the second-most serious problem endemic to all symbolic execution engines out there - environment support. We've put a hell of a lot of effort into trying to make angr's execution match a real machine's, but there's an innumerable number of ways this can fall apart, including errors or unsupported elements in the loading process, inconsistencies in syscall emulation, edge cases in file handling, cpuid (god forbid), the list goes on and on. If you want to make angr applicable to real programs, right now the unfortunate state is that you really need to be able to start debugging angr and its internals to figure out where inconsistencies start cropping up. In this case, I don't think the internal addresses (those are mapped by project._extern_obj and project._syscall_obj, which are custom CLE backends mapped into the program's memory space to provide addresses for simprocedure hooks and syscalls) are what's tripping you up here, it's probably some more fundamental emulation inconsistency.
My toolkit for debugging misfollows usually looks something like this:
Finally, I notice that the particular misfollow is near the time()
function, which has always been a stickler. If you dump the symbols from libc and look at time, you'll see that it's not a normal function type, it's an IFUNC
, which is a special symbol which actually points to a function which will dynamically determine the correct pointer to that function at runtime, usually based on cpuid, and then return that. It requires some cooperation from the dynamic linker to do this, and angr/cle cooperate as best they can, but it's a bit of a hack. The code to deal with this is angr/simos.py. It looks like this code in particular was written before the introduction of project.hook_symbol... Maybe switching it to that would make things easier.
project.loader.whats_at() is the best. The 0x200* turned out to be libc.so, and I also found angr syscalls and angr extrns in there.
What's a reasonable way to drop into a python console when path exploration gets to a particular place in the target binary? Right now I'm doing something unreasonable. I'm hacking in an if current.addr == 0x409ec7: raise Exception("Horrible hack to pop a debugger")
in tracer.next_branch() and then running ipython -i tcpdump.py
(tcpdump.py being a hacked together file used only for debugging this one problem). Can I accomplish this somehow with hooks? I'll still want to hand jam python code to poke around, but having everything in a file makes it easy to start over and get repeatable runs.
Finally, I'm seeing some weird stuff here.
In [22]: ["%#x" % x for x in current.addr_trace.hardcopy]
Out[22]: ['0x409e50']
In [23]: print(current.callstack.dbg_repr())
0 | 0x4057d6 -> 0x409e50, returning to 0x4057dd
1 | 0x5000080 -> 0x404db0, returning to 0x5000080
2 | 0x406900 -> 0x4029a0, returning to 0x406929
3 | None -> 0x5000350, returning to -0x1
The stacktrace all looks fine. It's ang_externs -> _start -> __libc_start_main -> main -> gmt2local (the function I care about). However, what's going on with the addr_trace? Shouldn't that be showing me all the basic blocks which were executed on the way to where we are now? If not, where can I get that info?
I think I'm starting to get the hang of tracking these things down, so hopefully I'll get to the bottom of this and we can get this issue closed out.
If you want to launch a debug shell using the hack method you described, import ipdb; ipdb.set_trace()
is the standard. ipdb is an extension of pdb, which is a gdb-style debug shell for python.
The non-hacky way is state.inspect.b('instruction', instruction=0x1234, action=what)
, where what
can be a function to call or the strings "ipython" or "ipdb" to launch either of those shells. Read about the SimInspect breakpoint stuff here.
I have no idea what's going on with your address trace, it definitely should be showing more addresses than that.
I'm back, and this time I'm drilling a binary which doesn't need any LD_PRELOAD junk, it's just a normal executable. This time I eventually get what looks to be a type confusion bug (b/c it looks like "name" should be a string, more specifically a filename), however it looks like the real problem might have happened quite a bit earlier...
I included the error which is where we seem to start going off the rails. I say this because my base address is 0x400000 and all the shared objects are above that... 20bd070 is unmapped memory (according to gdb when I manually run/debug this binary outside the whole Driller ecosystem). So I took a closer look at the instruction where this transition error occurred. Here's the relevant code:
When I put a breakpoint on *0x409e50 and run the app in gdb, rdi is zero and so the jump should be taken and we should be at 0x409ec0. This is where gdb goes. The dynamic appears to go to 0x409e64 (doesn't take the jump), which seems wrong. This breakpoint is only hit once, and when I look at the stacktrace it came from tcpdump.c:1533 which is
timezone_offset = gmt2local(0);
. Ergo we know that this will never go to 0x409e64. Furthermore, this is the only place in the source where gmt2local() is ever called, which means if we got here an rdi isn't zero, something has gone terribly wrong.For the symbolic side, it went to 0x409ec7, which is in the same neighborhood as 0x409ec0, but not quite right either. For context, here's the jump target:
So I think we have some important questions about the dynamic and symbolic targets, however the biggest question of all is: How on Earth did we get to 0x2079a40?!
After scanning back in the logs a little bit, I noticed that this is not the first time we've been in the 2000000 range. Here's the first time I start seeing that address. So maybe this is some expected artifact of how driller does the dynamic execution...?
In any case, I could use some help with some context here so I can get to the bottom of this and get the issue fixed, whatever it might be.