Closed fljmc closed 1 year ago
Hi there!
I don't see a problem here. These instructions are RIP
relative which basically means that the effective address will be:
RIP + displacement - instr.length
RIP = the address of the next instruction after your mov
instruction.
Hi!
Yes, so why "48 8b 05 19 10 00 00" doesn't disassembly to "movq 0x1019(%rip), %rax" instead of "mov 0x0000000000202188, %rax"?
The address is interesting during static analysis and you don't want to always calculate it yourself 🙂
However, this is just our default. You can override this behavior by setting the ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL flag in your formatter instance:
I have a sample dump here for more details: It is made from the following c++ code:
"const char* Str = "abcde";
int Tmp = 0xaabbccdd;
int main() { return Str[Tmp - 0xaabbccda]; }"
./main: file format elf64-x86-64
Disassembly of section .rodata:
0000000000200158 <.rodata>:
200158: 61
Disassembly of section .text:
0000000000201160
Disassembly of section .data:
0000000000202188
0000000000202190
So, as you can see, "201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax " here should place 0x202158 to rax. I.e an address stored at 0x202188 location. So, when the code does just "mov 0x0000000000202188, %rax" it is incorrect I believe. And it doesn't match llvm objdump output.
The address is interesting during static analysis and you don't want to always calculate it yourself 🙂
However, this is just our default. You can override this behavior by setting the ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL flag in your formatter instance:
Oh, I see. I'll take a look. Thanks for the info!
I am probably closing it as the behavior is intentional, hence it is not an issue as I supposed. Thanks again for help!
I might have misunderstood you here.
The RIP form should be correct, but you are saying that the absolute form is missing the pointer/address dereference parenthesis, right?
Technically you are correct. Let's reopen this issue and I'll try to remember why Zydis prints the absolute address without ()
in ATT syntax. It's definitely confusing, I have to admit.
In Intel syntax it seems correct:
== [ ATT ] ============================================================================================
ABSOLUTE: mov 0x0000000000001020, %rax
RELATIVE: mov 0x1019(%rip), %rax
== [ INTEL ] ============================================================================================
ABSOLUTE: mov rax, qword ptr ds:[0x0000000000001020]
RELATIVE: mov rax, qword ptr ds:[rip+0x1019]
Hi @fljmc, I checked this again and came to the conclusion that this is not a bug.
Literal values in AT&T syntax require the $
prefix which allows us to clearly distinguish an absolute address from a numeric literal.
AT&T syntax as well is a little bit special in a way that there is not "THE" ground of truth. Every assembler/disassembler seems to implement this syntax slightly different. For example, during my investigation I've seen these forms:
mov 0x0000000000001020, %rax
(what Zydis uses)mov 0x0000000000001020(,1), %rax
mov (0x0000000000001020), %rax
cc @athre0z
The following hex instructions:
201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax 20116f: 8b 0d 1b 10 00 00 movl 0x101b(%rip), %ecx
Are dissasembled incorrectly: mov 0x0000000000202188, %rax mov 0x0000000000202190, %ecx
so, for the first mov, for example, operation does not use a value at address 0x202188, but uses an address value by itself.