zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.28k stars 427 forks source link

Implicit memory operands that pushes values to stack should have displacement #510

Open NaC-L opened 3 weeks ago

NaC-L commented 3 weeks ago

if we do push rsp operand info tells us that we push rsp to [rsp] however, it should be [rsp-8(size)].

https://www.felixcloutier.com/x86/push

IA-32 Architecture Compatibility ¶
For IA-32 processors from the Intel 286 on, the PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. (This is also true for Intel 64 architecture, real-address and virtual-8086 modes of IA-32 architecture.) For the Intel® 8086 processor, the PUSH SP instruction pushes the new value of the SP register (that is the value after it has been decremented by 2).
mov rsp, 0x1008
push rsp 

is [0x1000] = 0x1008 and not [0x1000] = 0x1000

ZydisInfo.exe -64 -64 56
== [    BASIC ] ============================================================================================
   MNEMONIC: push [ENC: DEFAULT, MAP: DEFAULT, OPC: 0x56]
     LENGTH:  1
        SSZ: 64
       EOSZ: 64
       EASZ: 64
   CATEGORY: PUSH
    ISA-SET: I86
    ISA-EXT: BASE
 EXCEPTIONS: NONE
  OPTIMIZED: 56

== [ OPERANDS ] ============================================================================================
##       TYPE  VISIBILITY  ACTION      ENCODING   SIZE  NELEM  ELEMSZ  ELEMTYPE                        VALUE
--  ---------  ----------  ------  ------------   ----  -----  ------  --------  ---------------------------
 0   REGISTER    EXPLICIT       R        OPCODE     64      1      64       INT                          rsi
 1   REGISTER      HIDDEN      RW          NONE     64      1      64       INT                          rsp
 2     MEMORY      HIDDEN       W          NONE     64      1      64       INT  TYPE  =                 MEM
                                                                                 SEG   =                  ss
                                                                                 BASE  =                 rsp
                                                                                 INDEX =                none
                                                                                 SCALE =                   0
                                                                                 DISP  =  0x0000000000000000
--  ---------  ----------  ------  ------------   ----  -----  ------  --------  ---------------------------

== [      ATT ] ============================================================================================
   ABSOLUTE: push %rsi
   RELATIVE: push %rsi

== [    INTEL ] ============================================================================================
   ABSOLUTE: push rsi
   RELATIVE: push rsi

== [ SEGMENTS ] ============================================================================================
56
:..OPCODE
flobernd commented 2 weeks ago

Hi @NaC-L,

we do not provide full semantics in most cases (sometimes it's completely impossible without a rich context; e.g. for certain R/E/FLAGS bits). The generic [RSP] operand is mainly there to indicate, that "some kind" of memory access happens. This is useful to detect dirty registers and as well allowed me to indicate different R/W/RW for the action on the register itself and the action on the memory access.

However, I agree that we could relatively easily improve that case as it's a static condition. We just have to check a bunch of other instructions as well to check, if they behave the same (like CALL, PUSH2, ...). If it's too much effort to implement, I would rather not do it at the moment.

NaC-L commented 2 weeks ago

Behavior is the same with CALL, PUSHA/PUSHAD PUSHF/PUSHFD/PUSHFQ as far as I know, no other instruction behaves like this. I'm working on a solution by modifying OperandDefinitions.inc.

ZehMatt commented 2 weeks ago

I think it is correct as it is, the value is written to [rsp] after rsp is modified and not before, also check the pseudo code https://www.felixcloutier.com/x86/push#operation.

NaC-L commented 2 weeks ago

Hey @ZehMatt,

while psuedocode states that rsp is decremented first then value is written to the address, few lines above its mentioned:

The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. If a PUSH instruction uses a memory operand in which the ESP register is used for computing the operand address, the address of the operand is computed before the ESP register is decremented.

and For IA-32 processors from the Intel 286 on, the PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. (This is also true for Intel 64 architecture, real-address and virtual-8086 modes of IA-32 architecture.) For the Intel® 8086 processor, the PUSH SP instruction pushes the new value of the SP register (that is the value after it has been decremented by 2).

as my original post states

(you wouldnt move a value to rsp and push but assume you would for demonstration sake)
mov rsp, 0x1008
push rsp 

would result in [0x1000] = 0x1008 and not [0x1000] = 0x1000, you can confirm this by executing push rsp in a debugger and observing the values, it would push the value of rsp as it existed before executing the instruction.

its the intel manual being unclear again

ZehMatt commented 2 weeks ago

I think you are assuming that SRC is just an alias for RSP in the pseudo code, the operand has to be read first, if we go by read/write order then it will work out, also as @flobernd previously mentioned Zydis doesn't have the full semantics but its good enough as it is if you respect the read/write order, like this example:

  1. SRC = Operand[0] (R)
  2. RSP = RSP - 8 (RW)
  3. [RSP] = SRC (W)