howardjack / distorm

Automatically exported from code.google.com/p/distorm
GNU General Public License v3.0
0 stars 0 forks source link

Unexpected disassembly results on OpenBSD #48

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I am giving distorm a test run. I hacked together a python script that attempts 
to locate a function (dwarf info found using pyelftools) and disassemble the 
first 10 instructions therein.

I made a simple test program in C:

---8<---
int
do_stuff()
{
        return (errno);
}

int
main(void)
{
        printf("This is a test: %u %d\n", getpid(), do_stuff());

        return (0);
}
---8<---

We compile this with gcc and enable debug symbols so we can easily find main(). 
The binary is called "a".

Manual inspection with radare2 (http://radare.org/y/) shows the beginning of 
main looks like this:

---8<---
       0x00400a0d  sym.main:
       0x00400a0d     55               push rbp
       0x00400a0e     4889e5           mov rbp, rsp
       0x00400a11     53               push rbx
       0x00400a12     4883ec08         sub rsp, 0x8
       0x00400a16     b800000000       mov eax, 0x0 
       0x00400a1b     e8e0ffffff       call dword sym.do_stuff 
---8<---

This code *has* been relocated by radare2, so all addresses are virtual.

Now using distorm:

---8<---
def decode(filename, vaddr):
    print("Decoding %s at vaddr %s" % (filename, hex(vaddr)))
    with open(filename, "rb") as f:

        i = 0
        iterable = distorm3.DecodeGenerator(vaddr, f.read(), distorm3.Decode64Bits)
        for (offset, size, instruction, hexdump) in iterable:
            print("%.8x: %-32s %s" % (offset, hexdump, instruction))
            i += 1
            if (i == 10):
                break
---8<---

The vaddr parameter is provided by pyelf tools, The address of main matches 
what radare2 is reporting, so I believe the vaddr of main to be correct.

I get the following output:

---8<---
Decoding a at vaddr 0x400a0d
00400a0d: 7f45                             JG 0x400a54
00400a0f: 4c460201                         ADD R8B, [RCX]
00400a13: 0100                             ADD [RAX], EAX
00400a15: 0000                             ADD [RAX], AL
00400a17: 0000                             ADD [RAX], AL
00400a19: 0000                             ADD [RAX], AL
00400a1b: 0000                             ADD [RAX], AL
00400a1d: 0200                             ADD AL, [RAX]
00400a1f: 3e0001                           ADD [RCX], AL
00400a22: 0000                             ADD [RAX], AL
---8<---

I am not sure if this is a case of PEBCAK, but the instruction stream distorm 
is decoding does not appear to be correct.  I wondered if it was decoding file 
offsets rather than virtual addresses, but it can't be this, as the instruction 
stream there is [5e, 0a, 50, 00, ... ], which does not match either.

Odd.

I am attaching the binary incase it helps.

Cheers

---- TEMPLATE QUESTIONS ----
In what mode did you try to disassemble (16/32/64)?
64

What is the input buffer (binary stream) you used to reproduce the problem?
Not really related, see below.

What is the expected output (or what instruction)?
See above

Which tool did you use to see the expected output?
radare2

What do you see instead?
See above

What version of diStorm are you using? On what platform (Python/EXE/other)?
Python2.7 and distorm-3.2 on OpenBSD/amd64

Original issue reported on code.google.com by vex...@gmail.com on 17 May 2012 at 11:26

Attachments:

GoogleCodeExporter commented 9 years ago
OK, I think I have some insight on this now.

Distorm's decoder starts decoding from the start of the code you give it, which 
is assumed to  be located at the virtual address you pass to it via the 
'offset' parameter.

*NOT*: Distorm will relocate the code I give it and seek to the vaddr I pass.

Can you confirm this?

I suppose this is what is meant by "Note: The first argument offset is the 
virtual address of the code block. It is not an offset inside code! It is 
similar to the [org] directive of Assemblers.", but I read this as, distorm 
expects a virtual address, not a file offset.

In general I find this quite confusing. If you look at the ELF documentation 
(www.skyfree.org/linux/references/ELF_Format.pdf), throughout this document the 
term "offset" refers to a file offset, that is, code in a ELF file. Whereas, an 
"address" is a virtual address in a relocated memory image.

This is why I find this sentence confusing: "The first argument offset is the 
virtual address of the code block".

Anyway, have I answered my own question? Cheers

Original comment by vex...@gmail.com on 17 May 2012 at 2:33

GoogleCodeExporter commented 9 years ago
You answered your own question.
How should I rephrase it?
I think people still confuse between 'offset' and 'address'. I tried to clarify 
and yet I failed.
Thanks

Original comment by distorm@gmail.com on 17 May 2012 at 7:23

GoogleCodeExporter commented 9 years ago

Original comment by distorm@gmail.com on 19 May 2012 at 2:54