pacak / cargo-show-asm

cargo subcommand showing the assembly, LLVM-IR and MIR generated for Rust code
Apache License 2.0
714 stars 35 forks source link

Some constants not included in output #315

Closed burgerindividual closed 1 month ago

burgerindividual commented 1 month ago

When writing SIMD code, I've noticed that some constants don't get shown, even with --include-constants.

Example:

#[no_mangle]
pub extern "C" fn test() {
    let simd_reg = unsafe {
        std::arch::x86_64::_mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
    };
    std::hint::black_box(simd_reg);
}

Command: cargo asm --include-constants test Output:

.section .text.test,"ax",@progbits
        .globl  test
        .p2align        4, 0x90
        .type   test,@function
test:
        .cfi_startproc
        vmovaps xmm0, xmmword ptr [rip + .LCPI21_0]
        vmovaps xmmword ptr [rsp - 24], xmm0
        lea rax, [rsp - 24]
        #APP
        #NO_APP
        ret

In this example, .LCPI21_0 should be shown.

pacak commented 1 month ago

Yeah, looks like they are get parsed as "generic directive" and are simplified away by the pretty printer. Should be fixable.

burgerindividual commented 1 month ago

I'm not sure that this is entirely fixed. I have a function that has 7 constants used, but only seems to show 3 of them. .LCPI21_1, .LCPI21_4, .LCPI21_5, and .LCPI21_6 are missing. I'll try to come up with a way to reproduce this.

.section .text.test_pack,"ax",@progbits
    .globl  test_pack
    .p2align    4, 0x90
.type   test_pack,@function
test_pack:
    .cfi_startproc
    vmovd xmm2, edi
    vpshufb xmm3, xmm2, xmmword ptr [rip + .LCPI21_0]
    vpand xmm0, xmm0, xmmword ptr [rip + .LCPI21_1]
    vmovdqa xmm4, xmmword ptr [rip + .LCPI21_2]
    vinserti128 ymm0, ymm4, xmm0, 1
    vinserti128 ymm2, ymm2, xmm3, 1
    vpshufb ymm0, ymm2, ymm0
    vmovdqa xmm2, xmmword ptr [rip + .LCPI21_3]
    vinserti128 ymm1, ymm2, xmm1, 1
    vpsllw ymm2, ymm0, 4
    vpand ymm2, ymm2, ymmword ptr [rip + .LCPI21_4]
    vpsllw ymm1, ymm1, 5
    vpblendvb ymm0, ymm0, ymm2, ymm1
    vpsllw ymm2, ymm0, 2
    vpand ymm2, ymm2, ymmword ptr [rip + .LCPI21_5]
    vpand ymm1, ymm1, ymmword ptr [rip + .LCPI21_6]
    vpaddb ymm1, ymm1, ymm1
    vpblendvb ymm0, ymm0, ymm2, ymm1
    vpaddb ymm2, ymm0, ymm0
    vpaddb ymm1, ymm1, ymm1
    vpblendvb ymm0, ymm0, ymm2, ymm1
    vpmovmskb eax, ymm0
    vzeroupper
    ret

======================= Additional context =========================

.LCPI21_0:
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2

.LCPI21_2:
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0

.LCPI21_3:
    .byte   128
    .byte   128
    .byte   128
    .byte   64
    .byte   64
    .byte   64
    .byte   32
    .byte   32
    .byte   32
    .byte   16
    .byte   16
    .byte   16
    .byte   8
    .byte   8
    .byte   8
burgerindividual commented 1 month ago

for reference, this is what's generated from rustc with --emit asm (everything filtered is a .zero directive)

    .section    .rodata.cst16,"aM",@progbits,16
    .p2align    4, 0x0
.LCPI21_0:
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
.LCPI21_1:
    .zero   16,3
.LCPI21_2:
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
.LCPI21_3:
    .byte   128
    .byte   128
    .byte   128
    .byte   64
    .byte   64
    .byte   64
    .byte   32
    .byte   32
    .byte   32
    .byte   16
    .byte   16
    .byte   16
    .byte   8
    .byte   8
    .byte   8
    .byte   4
    .section    .rodata.cst32,"aM",@progbits,32
    .p2align    5, 0x0
.LCPI21_4:
    .zero   32,240
.LCPI21_5:
    .zero   32,252
.LCPI21_6:
    .zero   32,224
    .section    .text.test_pack,"ax",@progbits
    .globl  test_pack
    .p2align    4, 0x90
    .type   test_pack,@function
test_pack:
    vmovd   xmm2, edi
    vpshufb xmm3, xmm2, xmmword ptr [rip + .LCPI21_0]
    vpand   xmm0, xmm0, xmmword ptr [rip + .LCPI21_1]
    vmovdqa xmm4, xmmword ptr [rip + .LCPI21_2]
    vinserti128 ymm0, ymm4, xmm0, 1
    vinserti128 ymm2, ymm2, xmm3, 1
    vpshufb ymm0, ymm2, ymm0
    vmovdqa xmm2, xmmword ptr [rip + .LCPI21_3]
    vinserti128 ymm1, ymm2, xmm1, 1
    vpsllw  ymm2, ymm0, 4
    vpand   ymm2, ymm2, ymmword ptr [rip + .LCPI21_4]
    vpsllw  ymm1, ymm1, 5
    vpblendvb   ymm0, ymm0, ymm2, ymm1
    vpsllw  ymm2, ymm0, 2
    vpand   ymm2, ymm2, ymmword ptr [rip + .LCPI21_5]
    vpand   ymm1, ymm1, ymmword ptr [rip + .LCPI21_6]
    vpaddb  ymm1, ymm1, ymm1
    vpblendvb   ymm0, ymm0, ymm2, ymm1
    vpaddb  ymm2, ymm0, ymm0
    vpaddb  ymm1, ymm1, ymm1
    vpblendvb   ymm0, ymm0, ymm2, ymm1
    vpmovmskb   eax, ymm0
    vzeroupper
    ret
pacak commented 1 month ago

I'm not sure that this is entirely fixed. I have a function that has 7 constants used, but only seems to show 3 of them.

Is it using latest git release? It's not published yet at crates.io, I'm still looking at fixing some windows/mac regressions.

burgerindividual commented 1 month ago

This is using commit 34f22d8 which seems to currently be latest

pacak commented 1 month ago

I see. I appreciate the second report, will try to fix that a bit better :)

burgerindividual commented 1 month ago

If it helps, this seems to be the regex for Compiler Explorer's detection for data directives

pacak commented 1 month ago

Yup, .zero is missing. I wonder if license allows me to steal the whole regexp...

burgerindividual commented 1 month ago

I wonder if license allows me to steal the whole regexp...

It's BSD-2 so I think you need to include the license and copyright. Not a lawyer, though, so not totally sure.

burgerindividual commented 1 month ago

I just tested the latest commit, and it seems to still have a small issue. ~The directive seems to get recognized, but the actual .zero statement doesn't seem to be included in the output.~ Actually, all of the constants seem to have one line cut off at the end of each one. Not sure if you want me to open a new issue for it, just lmk.

.section .text.test_pack,"ax",@progbits
    .globl  test_pack
    .p2align    4, 0x90
.type   test_pack,@function
test_pack:
    .cfi_startproc
    vmovd xmm2, edi
    vpshufb xmm3, xmm2, xmmword ptr [rip + .LCPI21_0]
    vpand xmm0, xmm0, xmmword ptr [rip + .LCPI21_1]
    vmovdqa xmm4, xmmword ptr [rip + .LCPI21_2]
    vinserti128 ymm0, ymm4, xmm0, 1
    vinserti128 ymm2, ymm2, xmm3, 1
    vpshufb ymm0, ymm2, ymm0
    vmovdqa xmm2, xmmword ptr [rip + .LCPI21_3]
    vinserti128 ymm1, ymm2, xmm1, 1
    vpsllw ymm2, ymm0, 4
    vpand ymm2, ymm2, ymmword ptr [rip + .LCPI21_4]
    vpsllw ymm1, ymm1, 5
    vpblendvb ymm0, ymm0, ymm2, ymm1
    vpsllw ymm2, ymm0, 2
    vpand ymm2, ymm2, ymmword ptr [rip + .LCPI21_5]
    vpand ymm1, ymm1, ymmword ptr [rip + .LCPI21_6]
    vpaddb ymm1, ymm1, ymm1
    vpblendvb ymm0, ymm0, ymm2, ymm1
    vpaddb ymm2, ymm0, ymm0
    vpaddb ymm1, ymm1, ymm1
    vpblendvb ymm0, ymm0, ymm2, ymm1
    vpmovmskb eax, ymm0
    vzeroupper
    ret

======================= Additional context =========================

.LCPI21_0:
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2
    .byte   128
    .byte   0
    .byte   1
    .byte   2

.LCPI21_1:

.LCPI21_2:
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0
    .byte   2
    .byte   1
    .byte   0

.LCPI21_3:
    .byte   128
    .byte   128
    .byte   128
    .byte   64
    .byte   64
    .byte   64
    .byte   32
    .byte   32
    .byte   32
    .byte   16
    .byte   16
    .byte   16
    .byte   8
    .byte   8
    .byte   8

.LCPI21_4:

.LCPI21_5:

.LCPI21_6:
pacak commented 1 month ago

I just tested the latest commit, and it seems to still have a small issue

Hmm... Off by one somewhere it seems. Checking.