Clang ignores the `min_vector_width` attribute

richardebeling commented 1 year ago

Consider this code snippet (godbolt):

void test(uint64_t* input_, uint64_t* output_) __attribute__((min_vector_width(512))) {
    uint64_t* __restrict input = std::assume_aligned<64>(input_);
    uint64_t* __restrict output = std::assume_aligned<64>(output_);

    for (size_t i = 0; i < 64; ++i) {
        output[i] = input[i] * 16;
    }
}

When compiling on x86 with -std=c++20 -O3 -march=icelake-server, this resulting assembly does not use the 512-bit registers (zmm), but the AVX2 256-bit registers (ymm).

I understand that clang defaults to -mprefer-vector-width=256, but according to the documentation, the min_vector_width attribute should overrule this default.

With -mprefer-vector-width=512, clang uses the zmm registers.

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-codegen

phoebewang commented 1 year ago

I understand that clang defaults to -mprefer-vector-width=256,

Clang doesn't default to -mprefer-vector-width=256, but almost all 512-bit targets do. It's on when you specify -march=icelake-server.

but according to the documentation, the min_vector_width attribute should overrule this default.

This is not true. The doc says min_vector_width is a hint to inform the backend. hint means it is not a mandatory rule over prefer-vector-width. It appears to be true when you have 512-bit instruction in IR while min_vector_width > prefer-vector-width, but not for the opposite condition. inform the backend means min_vector_width only affects backend. In this case, the LoopVectorize in middle end was affected by prefer-vector-width and generated 256-bit instruction in IR. So the backend doesn't have chance to generate 512-bit instructions any more. https://godbolt.org/z/hdf8adKfK

llvm / llvm-project

Clang ignores the `min_vector_width` attribute #60946