llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.27k stars 12.1k forks source link

[FMV] Provide some way to explicitly refer to a specific version of a function #84094

Open jroelofs opened 8 months ago

jroelofs commented 8 months ago

Contrived example: https://clang.godbolt.org/z/WqMYbhrfv

#include <stddef.h>

__attribute__((target_version("simd")))
void saxpy(float *&Y, float A, const float *&X, size_t &elts);

__attribute__((target_version("default")))
void saxpy(float *&Y, float A, const float *&X, size_t &elts);

void saxpy_scalar(float *&Y, float A, const float *&X, size_t &elts) asm("_Z5saxpyRPffRPKfRm.default");

__attribute__((target_version("simd")))
void saxpy(float *&Y, float A, const float *&X, size_t &elts) {
    while ((elts -= 4) > 3) {
        *Y++ += A * *X++;
        *Y++ += A * *X++;
        *Y++ += A * *X++;
        *Y++ += A * *X++;  
    }

    // It would be nice to have syntax here to refer to
    // a specific version, without using `asm()` and thus
    // having to know the mangled name.
    saxpy_scalar(Y, A, X, elts);
}

__attribute__((target_version("default")))
void saxpy(float *&Y, float A, const float *&X, size_t &elts) {
    while (elts--) {
        *Y++ += A * *X++;
    }
}

void caller(float *Y, float A, const float *X, size_t elts) { saxpy(Y, A, X, elts); }
labrinea commented 1 month ago

Can you explain the motivation behind this idea? I had the impression that FMV by design is supposed to abstract the version selection away from the user and decide which version to call based on runtime detection.

jroelofs commented 1 month ago

The general idea is that sometimes it's beneficial to fall back on implementations with a lower FMV score when the problem size is small enough, or to handle some infrequent edge case. Don't take this too literally, but consider, e.g. SME engines: they're usually designed for medium to large problems, whereas NEON units are typically better at small problem sizes, so you might want to call the NEON implementation from the SME implementation.