Open HadrienG2 opened 3 years ago
Hello, thanks for the detailed issue!
The default x86
disassembler for Cutter (and Rizin) is capstone
so I believe the issue should be opened on their repository, but one would need to make sure the issue is indeed coming from capstone. There are already several issues mentioning AVX, see https://github.com/aquynh/capstone/issues?q=is%3Aissue+is%3Aopen+avx.
I'm pretty sure supporting other disassemblers like zydis has been done in the past in the form of a radare2 plugin. For now I can't really suggest any other workaround than making your own Rizin plugin to use another x86 disassembler (by the way I think it wouldn't be stupid to replace capstone with zydis for x86 disassembly in Rizin).
About your custom build of Cutter, I think you can safely download the latest AppImage generated by GitHub actions in the future (for example from the bottom of this page: https://github.com/rizinorg/cutter/actions/runs/586452938)
Thanks! capstone
does indeed look like a good track to follow. And thanks also for the tip on the CI AppImages.
As for writing a zydis
plugin, I don't think I will find the time, especially given that last time I had trouble just getting a Cutter master build with the usual plugins (ghidra, etc) to work. So my preferred workaround will probably be to priorize studying the AVX/AVX2 builds, and revert to basic disassembly for AVX-512, crossing fingers that it looks close enough.
Oh, by the way, I guess my use of Cutter for reversing what the compiler optimizer is doing as part of a micro-optimization process may be a little bit unorthodox, so let me share that it's actually a pretty great tool for that purpose!
Environment information
Describe the bug
Facing a function of a program that is built to use the AVX-512 vector instruction set where applicable, objdump v2.35.1 will produce the following reasonable-looking disassembly:
objdump disassembly
```asm 0000000000403b90 <_ZZ4mainENKUlvE3_clEv._omp_fn.0>: 403b90: 55 push %rbp 403b91: 48 89 e5 mov %rsp,%rbp 403b94: 41 56 push %r14 403b96: 41 55 push %r13 403b98: 41 54 push %r12 403b9a: 53 push %rbx 403b9b: 48 89 fb mov %rdi,%rbx 403b9e: 48 83 e4 c0 and $0xffffffffffffffc0,%rsp 403ba2: e8 d9 d5 ff ff callq 401180Given the same program, however, Cutter v1.12 will produce output that looks a lot less reasonable, including for example "invalid" instructions and "int1" instructions:
Cutter disassembly
```asm ;-- main::{lambda()#5}::operator()() const [clone ._omp_fn.0]: 333: method.main._lambda___5_::operator_____const__clone_._omp_fn.0 (int64_t arg1); ; var int64_t var_20h @ rbp-0x20 ; arg int64_t arg1 @ rdi 0x00403b90 push rbp ; tstcpu.cc:660 ; main::{lambda()#5}::operator()() const [clone ._omp_fn.0] 0x00403b91 mov rbp, rsp 0x00403b94 push r14 0x00403b96 push r13 0x00403b98 push r12 0x00403b9a push rbx 0x00403b9b mov rbx, rdi ; arg1 0x00403b9e and rsp, 0xffffffffffffffc0 0x00403ba2 call omp_get_num_threads ; sym.imp.omp_get_num_threads 0x00403ba7 movsxd r12, eax 0x00403baa call omp_get_thread_num ; sym.imp.omp_get_thread_num 0x00403baf movsxd rcx, eax 0x00403bb2 xor edx, edx 0x00403bb4 mov eax, 0x100 ; 256 0x00403bb9 div r12 0x00403bbc cmp rcx, rdx 0x00403bbf jb 0x40408f 0x00403bc5 imul rcx, rax 0x00403bc9 add rdx, rcx 0x00403bcc add rax, rdx 0x00403bcf cmp rdx, rax 0x00403bd2 jae 0x404082 0x00403bd8 mov rcx, qword [rbx] 0x00403bdb lea r8, [rdx + rdx*8] 0x00403bdf mov rsi, qword [rcx + 8] ; tstcpu.cc:681 0x00403be3 mov rcx, qword [rcx + 0x10] ; tstcpu.cc:734 0x00403be7 lea r13, [rdx + rdx] 0x00403beb mov rbx, qword [rcx] 0x00403bee shl rdx, 7 0x00403bf2 mov r9, qword [rsi] ; tstcpu.cc:681 0x00403bf5 vmovdqa32 zmm1, zmmword [0x00405380] 0x00403bff vmovdqa32 zmm0, zmmword [0x004053c0] 0x00403c09 vmovaps zmm3, zmmword [0x00405400] 0x00403c13 shl r8, 3 0x00403c17 add rbx, rdx 0x00403c1a lea r12, [rax + rax] 0x00403c1e mov rdx, rbx ; tstcpu.cc:661 0x00403c21 mov rax, rbx 0x00403c24 mov r10, rbx 0x00403c27 xor ecx, ecx 0x00403c29 xor r11d, r11d ; tstcpu.cc:670 0x00403c2c jmp 0x404013 0x00403c31 nop dword [rax] 0x00403c38 lea rsi, [rdi + r8 + 2] ; tstcpu.cc:739 0x00403c3d shl rsi, 6 ; tstcpu.cc:681 0x00403c41 add rsi, r9 0x00403c44 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403c4a vmovaps zmm4, zmmword [rsi + 0x40] 0x00403c51 vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403c57 vpermt2ps zmm6, zmm1, zmm4 0x00403c5d vmovaps zmmword [rax], zmm6 ; tstcpu.cc:751 0x00403c63 vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403c69 invalid ; tstcpu.cc:699 0x00403c6a int1 0x00403c6b pop rsp 0x00403c6c push rdi 0x00403c6e jmp 0x403cd2 0x00403c70 bnd jge 0x403cbb 0x00403c73 jg 0x403c69 0x00403c75 vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403c7b vpermt2ps zmm4, zmm1, zmm5 0x00403c81 vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403c87 vmovaps zmmword [rax + 0x40], zmm6 ; tstcpu.cc:752 0x00403c8e vmovaps zmmword [rdx], zmm4 ; tstcpu.cc:753 0x00403c94 vmovaps zmmword [rdx + 0x40], zmm2 ; tstcpu.cc:754 0x00403c9b cmp rcx, 1 ; tstcpu.cc:756 ; 1 0x00403c9f je 0x404060 0x00403ca5 lea rsi, [rdi + r8 + 4] ; tstcpu.cc:739 0x00403caa shl rsi, 6 ; tstcpu.cc:681 0x00403cae add rsi, r9 0x00403cb1 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403cb7 vmovaps zmm4, zmmword [rsi + 0x40] 0x00403cbe vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403cc4 vpermt2ps zmm6, zmm1, zmm4 0x00403cca vmovaps zmmword [rax + 0x8000], zmm6 ; tstcpu.cc:751 0x00403cd4 vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403cda invalid ; tstcpu.cc:699 0x00403cdb int1 0x00403cdc pop rsp 0x00403cdd push rdi 0x00403cdf jmp 0x403d43 0x00403ce1 bnd jge 0x403d2c 0x00403ce4 jg 0x403cda 0x00403ce6 vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403cec vpermt2ps zmm4, zmm1, zmm5 0x00403cf2 vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403cf8 vmovaps zmmword [rax + 0x8040], zmm6 ; tstcpu.cc:752 0x00403d02 vmovaps zmmword [rdx + 0x40000], zmm4 ; tstcpu.cc:753 0x00403d0c vmovaps zmmword [rdx + 0x40040], zmm2 ; tstcpu.cc:754 0x00403d16 cmp rcx, 2 ; tstcpu.cc:756 ; 2 0x00403d1a je 0x404060 0x00403d20 lea rsi, [rdi + r8 + 6] ; tstcpu.cc:739 0x00403d25 shl rsi, 6 ; tstcpu.cc:681 0x00403d29 add rsi, r9 0x00403d2c vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403d32 vmovaps zmm4, zmmword [rsi + 0x40] 0x00403d39 vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403d3f vpermt2ps zmm6, zmm1, zmm4 0x00403d45 vmovaps zmmword [rax + 0x10000], zmm6 ; tstcpu.cc:751 0x00403d4f vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403d55 invalid ; tstcpu.cc:699 0x00403d56 int1 0x00403d57 pop rsp 0x00403d58 push rdi 0x00403d5a jmp 0x403dbe 0x00403d5c bnd jge 0x403da7 0x00403d5f jg 0x403d55 0x00403d61 vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403d67 vpermt2ps zmm4, zmm1, zmm5 0x00403d6d vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403d73 vmovaps zmmword [rax + 0x10040], zmm6 ; tstcpu.cc:752 0x00403d7d vmovaps zmmword [rdx + 0x80000], zmm4 ; tstcpu.cc:753 0x00403d87 vmovaps zmmword [rdx + 0x80040], zmm2 ; tstcpu.cc:754 0x00403d91 cmp rcx, 3 ; tstcpu.cc:756 ; 3 0x00403d95 je 0x404060 0x00403d9b lea rsi, [rdi + r8 + 8] ; tstcpu.cc:739 0x00403da0 shl rsi, 6 ; tstcpu.cc:681 0x00403da4 add rsi, r9 0x00403da7 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403dad vmovaps zmm4, zmmword [rsi + 0x40] 0x00403db4 vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403dba vpermt2ps zmm6, zmm1, zmm4 0x00403dc0 vmovaps zmmword [rax + 0x18000], zmm6 ; tstcpu.cc:751 0x00403dca vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403dd0 invalid ; tstcpu.cc:699 0x00403dd1 int1 0x00403dd2 pop rsp 0x00403dd3 push rdi 0x00403dd5 jmp 0x403e39 0x00403dd7 bnd jge 0x403e22 0x00403dda jg 0x403dd0 0x00403ddc vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403de2 vpermt2ps zmm4, zmm1, zmm5 0x00403de8 vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403dee vmovaps zmmword [rax + 0x18040], zmm6 ; tstcpu.cc:752 0x00403df8 vmovaps zmmword [rdx + 0xc0000], zmm4 ; tstcpu.cc:753 0x00403e02 vmovaps zmmword [rdx + 0xc0040], zmm2 ; tstcpu.cc:754 0x00403e0c cmp rcx, 4 ; tstcpu.cc:756 ; 4 0x00403e10 je 0x404060 0x00403e16 lea rsi, [rdi + r8 + 0xa] ; tstcpu.cc:739 0x00403e1b shl rsi, 6 ; tstcpu.cc:681 0x00403e1f add rsi, r9 0x00403e22 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403e28 vmovaps zmm4, zmmword [rsi + 0x40] 0x00403e2f vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403e35 vpermt2ps zmm6, zmm1, zmm4 0x00403e3b vmovaps zmmword [rax + 0x20000], zmm6 ; tstcpu.cc:751 0x00403e45 vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403e4b invalid ; tstcpu.cc:699 0x00403e4c int1 0x00403e4d pop rsp 0x00403e4e push rdi 0x00403e50 jmp 0x403eb4 0x00403e52 bnd jge 0x403e9d 0x00403e55 jg 0x403e4b 0x00403e57 vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403e5d vpermt2ps zmm4, zmm1, zmm5 0x00403e63 vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403e69 vmovaps zmmword [rax + 0x20040], zmm6 ; tstcpu.cc:752 0x00403e73 vmovaps zmmword [rdx + 0x100000], zmm4 ; tstcpu.cc:753 0x00403e7d vmovaps zmmword [rdx + 0x100040], zmm2 ; tstcpu.cc:754 0x00403e87 cmp rcx, 5 ; tstcpu.cc:756 ; 5 0x00403e8b je 0x404060 0x00403e91 lea rsi, [rdi + r8 + 0xc] ; tstcpu.cc:739 0x00403e96 shl rsi, 6 ; tstcpu.cc:681 0x00403e9a add rsi, r9 0x00403e9d vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403ea3 vmovaps zmm4, zmmword [rsi + 0x40] 0x00403eaa vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403eb0 vpermt2ps zmm6, zmm1, zmm4 0x00403eb6 vmovaps zmmword [rax + 0x28000], zmm6 ; tstcpu.cc:751 0x00403ec0 vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403ec6 invalid ; tstcpu.cc:699 0x00403ec7 int1 0x00403ec8 pop rsp 0x00403ec9 push rdi 0x00403ecb jmp 0x403f2f 0x00403ecd bnd jge 0x403f18 0x00403ed0 jg 0x403ec6 0x00403ed2 vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403ed8 vpermt2ps zmm4, zmm1, zmm5 0x00403ede vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403ee4 vmovaps zmmword [rax + 0x28040], zmm6 ; tstcpu.cc:752 0x00403eee vmovaps zmmword [rdx + 0x140000], zmm4 ; tstcpu.cc:753 0x00403ef8 vmovaps zmmword [rdx + 0x140040], zmm2 ; tstcpu.cc:754 0x00403f02 cmp rcx, 6 ; tstcpu.cc:756 ; 6 0x00403f06 je 0x404060 0x00403f0c lea rsi, [rdi + r8 + 0xe] ; tstcpu.cc:739 0x00403f11 shl rsi, 6 ; tstcpu.cc:681 0x00403f15 add rsi, r9 0x00403f18 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403f1e vmovaps zmm4, zmmword [rsi + 0x40] 0x00403f25 vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403f2b vpermt2ps zmm6, zmm1, zmm4 0x00403f31 vmovaps zmmword [rax + 0x30000], zmm6 ; tstcpu.cc:751 0x00403f3b vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403f41 invalid ; tstcpu.cc:699 0x00403f42 int1 0x00403f43 pop rsp 0x00403f44 push rdi 0x00403f46 jmp 0x403faa 0x00403f48 bnd jge 0x403f93 0x00403f4b jg 0x403f41 0x00403f4d vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403f53 vpermt2ps zmm4, zmm1, zmm5 0x00403f59 vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403f5f vmovaps zmmword [rax + 0x30040], zmm6 ; tstcpu.cc:752 0x00403f69 vmovaps zmmword [rdx + 0x180000], zmm4 ; tstcpu.cc:753 0x00403f73 vmovaps zmmword [rdx + 0x180040], zmm2 ; tstcpu.cc:754 0x00403f7d cmp rcx, 7 ; tstcpu.cc:756 ; 7 0x00403f81 je 0x404060 0x00403f87 lea rsi, [rdi + r8 + 0x10] ; tstcpu.cc:739 0x00403f8c shl rsi, 6 ; tstcpu.cc:681 0x00403f90 add rsi, r9 0x00403f93 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00403f99 vmovaps zmm4, zmmword [rsi + 0x40] 0x00403fa0 vmovaps zmm6, zmm2 ; tstcpu.cc:686 0x00403fa6 vpermt2ps zmm6, zmm1, zmm4 0x00403fac vmovaps zmmword [rax + 0x38000], zmm6 ; tstcpu.cc:751 0x00403fb6 vmovaps zmm6, zmm2 ; tstcpu.cc:692 0x00403fbc invalid ; tstcpu.cc:699 0x00403fbd int1 0x00403fbe pop rsp 0x00403fbf push rdi 0x00403fc1 jmp 0x404025 ; method.main._lambda___5_::operator_____const__clone_._omp_fn.0+0x495 0x00403fc3 bnd jge 0x40400e ; method.main._lambda___5_::operator_____const__clone_._omp_fn.0+0x47e 0x00403fc6 jg 0x403fbc 0x00403fc8 vmovaps zmm4, zmm2 ; tstcpu.cc:697 0x00403fce vpermt2ps zmm4, zmm1, zmm5 0x00403fd4 vpermt2ps zmm2, zmm0, zmm5 ; tstcpu.cc:702 0x00403fda vmovaps zmmword [rax + 0x38040], zmm6 ; tstcpu.cc:752 0x00403fe4 add r11, 9 ; tstcpu.cc:753 0x00403fe8 vmovaps zmmword [rdx + 0x1c0000], zmm4 0x00403ff2 vmovaps zmmword [rdx + 0x1c0040], zmm2 ; tstcpu.cc:754 0x00403ffc inc rcx ; tstcpu.cc:756 0x00403fff add r10, 0x48000 ; tstcpu.cc:716 0x00404006 add rax, 0x40000 0x0040400c add rdx, 0x8000 0x00404013 lea rdi, [r11 + r11] ; tstcpu.cc:717 0x00404017 lea rsi, [rdi + r8] ; tstcpu.cc:680 0x0040401b shl rsi, 6 ; tstcpu.cc:681 0x0040401f add rsi, r9 0x00404022 vmovaps zmm2, zmmword [rsi] ; tstcpu.cc:685 0x00404028 vmovaps zmm4, zmmword [rsi + 0x40] 0x0040402f vmovaps zmm5, zmm2 ; tstcpu.cc:686 0x00404035 vpermt2ps zmm5, zmm1, zmm4 0x0040403b vpermt2ps zmm2, zmm0, zmm4 ; tstcpu.cc:692 0x00404041 lea r14, [r11 + 1] ; tstcpu.cc:709 0x00404045 vmovaps zmmword [r10], zmm5 ; tstcpu.cc:712 0x0040404b vmovaps zmmword [r10 + 0x40], zmm2 ; tstcpu.cc:735 0x00404052 test rcx, rcx ; tstcpu.cc:738 0x00404055 jne 0x403c38 0x0040405b mov r11, r14 0x0040405e jmp 0x403ffc 0x00404060 lea r11, [r14 + rcx] ; tstcpu.cc:709 0x00404064 cmp rcx, 7 ; 7 0x00404068 jne 0x403ffc ; method.main._lambda___5_::operator_____const__clone_._omp_fn.0+0x46c 0x0040406a add r13, 2 ; tstcpu.cc:760 0x0040406e add r8, 0x48 ; 72 0x00404072 sub rbx, 0xffffffffffffff80 0x00404076 cmp r13, r12 0x00404079 jne 0x403c1e ; method.main._lambda___5_::operator_____const__clone_._omp_fn.0+0x8e 0x0040407f vzeroupper 0x00404082 lea rsp, [var_20h] 0x00404086 pop rbx 0x00404087 pop r12 0x00404089 pop r13 0x0040408b pop r14 0x0040408d pop rbp 0x0040408e ret 0x0040408f inc rax ; tstcpu.cc:660 0x00404092 xor edx, edx 0x00404094 jmp 0x403bc5 0x00404099 nop dword [rax] ```While the symptoms are most easily visible in the Disassembly view, they obviously affect other disassembly-based Cutter functionality including the control flow graph, which is truncated at the first "invalid" instruction.
If the program is compiled with only AVX/AVX2 instructions instead, Cutter's disassembly looks perfect, which suggests that somehow the use of AVX-512 is important. I suspect this might be related to the fact that this instruction set is rather recent and the associated radare2/Cutter code may not have seen very wide use in the field yet.
Given the nature of the symptoms, I strongly suspect that this might be a backend issue in radare2 (the latest Cutter release, Cutter 1.12, came out before the rizin fork), but I will need some tips to isolate it further there.
To Reproduce
Here is the offending executable, zipped so that github's file filter will accept it: a.out.zip
The symbol you're interested in is called
method.main._lambda___5_::operator_____const__clone_._omp_fn.0
in Cutter's demangled terminology, according to the objdump output above the mangled symbol is_ZZ4mainENKUlvE3_clEv._omp_fn.0
.Expected behavior
Cutter's assembly output should not feature "invalid" instructions and its control graph output should not be truncated. Instead, the disassembly should roughly match the one produced by objdump (though obviously in Intel syntax).
Additional context
I encountered various issues trying to get a custom build of Cutter to work on my machine, so for now I'm resorting to official AppImage binaries only. This means that it's hard for me to check if the bug is still present on master.