Incorrect instruction operands from DecomposeGenerator

GoogleCodeExporter commented 9 years ago

When decomposing a 64-bit binary, the instruction:
mov    rax, 0xffffffffffffffff

seems to give a operand[1] of '0x-1', yet other immediate operands which would 
be negative if interpreted signed, show the correct outcome. I would expect all 
operands to be given independent of sign, ie. as raw hex, right?

For example, attached is a test case showing the problem. The test case uses a 
DecomposeGenerator to decompose:

mov   rax,0x7fffffffffffffff           
mov   rax,0xffffffffffffffff           
mov   rax,0x8000000000000000        

I then print what distorm thought the instruction was, and then the operands 
from the Instruction instance.

% python bug.py
MOV RAX, 0x7fffffffffffffff
  operands: 1: RAX      2: 0x7fffffffffffffff
MOV RAX, 0xffffffffffffffff
  operands: 1: RAX      2: -0x1
MOV RAX, 0x8000000000000000
  operands: 1: RAX      2: 0x8000000000000000

Operand 2 of the second instruction is incorrect(?).

I am using todays svn on OpenBSD-current with Python-2.7. Bug reproducible on 
i386 and amd64.

A patch to fix this would be appreciated.

Thanks

Original issue reported on code.google.com by vex...@gmail.com on 22 Jul 2012 at 12:50

Attachments:

bug.py

GoogleCodeExporter commented 9 years ago

Woops, a typo. The instruction in question seems to give a operand[1] of 
'-0x1', not '0x-1'. Sorry. The test case output shows this.

Original comment by vex...@gmail.com on 22 Jul 2012 at 12:57

GoogleCodeExporter commented 9 years ago

After speaking to a college, we think that the second mov was interpreted 
signed due to the variation of mov that was used; specifically, it is a sign 
extended variant.

Original comment by vex...@gmail.com on 23 Jul 2012 at 11:32

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Looking at the sample you sent, it seems you're right.

Original comment by distorm@gmail.com on 23 Jul 2012 at 9:43

GoogleCodeExporter commented 9 years ago

It's because the source operand is IMM32 which gets sign extended to IMM64 in 
64 bits.

Original comment by distorm@gmail.com on 24 Jul 2012 at 4:56

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

Cheers

So to clarify, there is no canonical representation of immediate operands in 
distorm? Number representation is context specific depending upon if the 
operation implies signedness?

For example 0x8000000000000000 is a large positive number if it is an operands 
to a opcode implying an unsigned context otherwise it is a small negative 
number.

I was expecting to see the operands in a "context insensitive" manner, as just 
raw hex bytes, that's all.

Original comment by vex...@gmail.com on 24 Jul 2012 at 5:28

ohio813 / distorm

Incorrect instruction operands from DecomposeGenerator #50