zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.47k stars 436 forks source link

ATT: Missing parenthesis for absolute memory operands #454

Closed fljmc closed 1 year ago

fljmc commented 1 year ago

The following hex instructions:

201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax 20116f: 8b 0d 1b 10 00 00 movl 0x101b(%rip), %ecx

Are dissasembled incorrectly: mov 0x0000000000202188, %rax mov 0x0000000000202190, %ecx

so, for the first mov, for example, operation does not use a value at address 0x202188, but uses an address value by itself.

flobernd commented 1 year ago

Hi there!

I don't see a problem here. These instructions are RIP relative which basically means that the effective address will be:

RIP + displacement - instr.length

RIP = the address of the next instruction after your mov instruction.

fljmc commented 1 year ago

Hi!

Yes, so why "48 8b 05 19 10 00 00" doesn't disassembly to "movq 0x1019(%rip), %rax" instead of "mov 0x0000000000202188, %rax"?

flobernd commented 1 year ago

The address is interesting during static analysis and you don't want to always calculate it yourself 🙂

However, this is just our default. You can override this behavior by setting the ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL flag in your formatter instance:

https://github.com/zyantific/zydis/blob/460570fec89584d80a6a062fa49782c22cc296a1/include/Zydis/Formatter.h#L144

fljmc commented 1 year ago

I have a sample dump here for more details: It is made from the following c++ code:

"const char* Str = "abcde";

int Tmp = 0xaabbccdd;

int main() { return Str[Tmp - 0xaabbccda]; }"

./main: file format elf64-x86-64

Disassembly of section .rodata:

0000000000200158 <.rodata>: 200158: 61 200159: 62 63 64 65 00

Disassembly of section .text:

0000000000201160

: 201160: c7 44 24 fc 00 00 00 00 movl $0x0, -0x4(%rsp) 201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax # 0x202188 20116f: 8b 0d 1b 10 00 00 movl 0x101b(%rip), %ecx # 0x202190 201175: 81 e9 da cc bb aa subl $0xaabbccda, %ecx # imm = 0xAABBCCDA 20117b: 89 c9 movl %ecx, %ecx 20117d: 0f be 04 08 movsbl (%rax,%rcx), %eax 201181: c3 retq

Disassembly of section .data:

0000000000202188 : 202188: 58 popq %rax 202189: 01 20 addl %esp, (%rax) 20218b: 00 00 addb %al, (%rax) 20218d: 00 00 addb %al, (%rax) 20218f: 00 dd addb %bl, %ch

0000000000202190 : 202190: dd cc 202192: bb 202193: aa stosb %al, %es:(%rdi)

So, as you can see, "201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax " here should place 0x202158 to rax. I.e an address stored at 0x202188 location. So, when the code does just "mov 0x0000000000202188, %rax" it is incorrect I believe. And it doesn't match llvm objdump output.

fljmc commented 1 year ago

The address is interesting during static analysis and you don't want to always calculate it yourself 🙂

However, this is just our default. You can override this behavior by setting the ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL flag in your formatter instance:

https://github.com/zyantific/zydis/blob/460570fec89584d80a6a062fa49782c22cc296a1/include/Zydis/Formatter.h#L144

Oh, I see. I'll take a look. Thanks for the info!

fljmc commented 1 year ago

I am probably closing it as the behavior is intentional, hence it is not an issue as I supposed. Thanks again for help!

flobernd commented 1 year ago

I might have misunderstood you here.

The RIP form should be correct, but you are saying that the absolute form is missing the pointer/address dereference parenthesis, right?

Technically you are correct. Let's reopen this issue and I'll try to remember why Zydis prints the absolute address without () in ATT syntax. It's definitely confusing, I have to admit.

In Intel syntax it seems correct:

== [      ATT ] ============================================================================================
   ABSOLUTE: mov 0x0000000000001020, %rax
   RELATIVE: mov 0x1019(%rip), %rax

== [    INTEL ] ============================================================================================
   ABSOLUTE: mov rax, qword ptr ds:[0x0000000000001020]
   RELATIVE: mov rax, qword ptr ds:[rip+0x1019]
flobernd commented 1 year ago

Hi @fljmc, I checked this again and came to the conclusion that this is not a bug.

Literal values in AT&T syntax require the $ prefix which allows us to clearly distinguish an absolute address from a numeric literal.

AT&T syntax as well is a little bit special in a way that there is not "THE" ground of truth. Every assembler/disassembler seems to implement this syntax slightly different. For example, during my investigation I've seen these forms:

cc @athre0z