minio / c2goasm

C to Go Assembly
Apache License 2.0
1.31k stars 110 forks source link

How to prevent from generating a huge constant table ? #8

Closed lrita closed 5 years ago

lrita commented 6 years ago

I have a c code:


#include <stdint.h>
#if defined ENABLE_AVX2
#define NAME(x) x##_avx2
#elif defined ENABLE_AVX
#define NAME(x) x##_avx
#elif defined ENABLE_SSE4_2
#define NAME(x) x##_sse4_2
#endif

int64_t NAME(sample_sum)(int64_t *beg, int64_t len) {
    int64_t sum = 0;
    int64_t *end = beg + len;
    while (beg < end) {
        sum += *beg++;
    }
    return sum;
}

int64_t NAME(sample_max)(int64_t *beg, int64_t len) {
    int64_t max = 0x8000000000000000;
    int64_t *end = beg + len;
    if (len == 0) {
        return 0;
    }
    while (beg < end) {
        if (*beg > max) {
            max = *beg;
        }
        beg++;
    }
    return max;
}

And compile it to asm by:

clang -S -DENABLE_AVX2 -target x86_64-unknown-none -masm=intel -mno-red-zone -mstackrealign -mllvm -inline-threshold=1000 -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -O3 -fno-builtin -ffast-math -mavx2 lib/sample.c -o lib/sample_avx2.s

I found clang/llvm compile the local variables int64_t max = 0x8000000000000000; to global:

.LBB1_5:
    vpbroadcastq    ymm0, qword ptr [rip + .LCPI1_0]
    vmovdqa ymm3, ymm0
    vmovdqa ymm2, ymm0
    vmovdqa ymm1, ymm0

....

.LCPI1_0:
    .quad   -9223372036854775808    # 0x8000000000000000
    .section    .rodata,"a",@progbits
    .align  32
.LCPI1_1:
    .long   0                       # 0x0
    .long   2                       # 0x2
    .long   4                       # 0x4
    .long   6                       # 0x6
    .zero   4
    .zero   4
    .zero   4
    .zero   4
    .text
    .globl  sample_max_avx2

....

    .ident  "Apple LLVM version 8.0.0 (clang-800.0.42.1)"
    .section    ".note.GNU-stack","",@progbits

Thus, when I use c2goasm to generate the goasm,

it found .quad -9223372036854775808 # 0x8000000000000000 .section .rodata,"a",@progbits .align 32 by getFirstLabelConstants, and generate a huge constant table by defineTable.

Thanks.

lrita commented 6 years ago

I modify the asm code to this manually

.LCPI1_0:
    .quad   -9223372036854775808    # 0x8000000000000000
        .quad   -9223372036854775808    # 0x8000000000000000
        .quad   -9223372036854775808    # 0x8000000000000000
        .quad   -9223372036854775808    # 0x8000000000000000
    .section    .rodata,"a",@progbits
.LCPI1_1:
    .long   0                       # 0x0
    .long   2                       # 0x2
    .long   4                       # 0x4
    .long   6                       # 0x6
    .zero   4
    .zero   4
    .zero   4
    .zero   4
    .text
    .globl  sample_max_avx2

and it works. Is it a correct way and a safely way?

fwessels commented 6 years ago

Yes, that looks correct and should be ok.

lrita commented 6 years ago

@fwessels thanks for your reply. Does clang have a flag to enable/disable move the local variables to constant? I doesnot find a relevant flag and I does not know clang well .

kannappanr commented 6 years ago

@Irita Sorry for responding late to your question. Looks like your question got answered on https://stackoverflow.com/questions/50126786/how-to-prevent-clang-llvm-compile-local-variables-to-global . Did you get a chance to test the suggestion? Also, clang has lots of flags that the result can be different with different combinations of flags. so, please try with the options given here https://clang.llvm.org/docs/ClangCommandLineReference.html

lrita commented 6 years ago

Hi @kannappanr, If using -Os instead of -O3, the clang generate asm code without AVX2 instructions. It is not we wanted, we want the asm codes which is optimized for performance.

There are too many optimizations in -O3, and which one affects the move local variables to constant should be tested.

I do not find a quickly and easy way to do the test. Do I must generate the asm code by a hard way (llvm->opt->llc) ?

https://laure.gonnord.org/pro/teaching/CAP1718_ENSL/llvm_fernando.pdf

fwessels commented 6 years ago

Hello @lrita , I am afraid you will need to do some testing on your own, clang has a ton of options and we only know the (bare) basics of it.

Going the opt / llc way is certainly another option, but these tools have their own set of flags etc so you will need to study this.