llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.95k stars 11.53k forks source link

Remove __attribute__((target_clones)) requirement in function prototype #61219

Open HFTrader opened 1 year ago

HFTrader commented 1 year ago

Let's say I have a library that I want to share with multiple targets

#include <array>
#include <cstdio>

using Vector = std::array<float, 2>;
using Matrix = std::array<float, 4>;

__attribute__((target_clones("default","arch=core2","arch=znver2")))
Vector multiply(const Matrix& m, const Vector& v) {
    Vector r;
    r[0] = v[0] * m[0] + v[1] * m[2];
    r[1] = v[0] * m[1] + v[1] * m[3];
    return r;
}

and I want to use that as below

#include <array>
#include <cstdio>

using Vector = std::array<float, 2>;
using Matrix = std::array<float, 4>;

Vector multiply(const Matrix& m, const Vector& v);

int main() {
    Matrix m{1,2,3,4};
    Vector v{1,2};
    Vector r = multiply(m,v);
    printf( "%f %f\n", r[0], r[1] );
}

Godbolt project: https://godbolt.org/z/3hd4MrzsG

GCC will be happy to compile and link the above two files together but Clang++ will complain about undefined references.

ld: CMakeFiles/example.dir/example.cpp.o: in function `main':
example.cpp:(.text+0x2a): undefined reference to `multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&)'

To make this work, I have to add all the targets in the prototype as well as in

#include <array>
#include <cstdio>

using Vector = std::array<float, 2>;
using Matrix = std::array<float, 4>;

__attribute__((target_clones("default","arch=core2","arch=znver2")))
Vector multiply(const Matrix& m, const Vector& v);

int main() {
    Matrix m{1,2,3,4};
    Vector v{1,2};
    Vector r = multiply(m,v);
    printf( "%f %f\n", r[0], r[1] );
}

Looking at the object files generated, it seeems that the main difference is that GCC generates an indirect link for the naked prototype while Clang generate an indirect link to .ifunc and that requires knowledge of the target attributes.

$ diff nm.gcc nm.clang
3,4d2
<                 U _GLOBAL_OFFSET_TABLE_
<                 U __stack_chk_fail
6,13c4,11
< i multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&)
< t multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .arch_cascadelake.3]
< t _Z8multiplyRKSt5arrayIfLm4EERKS_IfLm2EE.arch_core2.0
< t multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .arch_haswell.2]
< t multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .arch_sandybridge.1]
< t _Z8multiplyRKSt5arrayIfLm4EERKS_IfLm2EE.arch_znver1.4
< t _Z8multiplyRKSt5arrayIfLm4EERKS_IfLm2EE.arch_znver2.5
< t multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .default.6]
---
> T multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .arch_cascadelake.3]
> T _Z8multiplyRKSt5arrayIfLm4EERKS_IfLm2EE.arch_core2.0
> T multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .arch_haswell.2]
> T multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .arch_sandybridge.1]
> T _Z8multiplyRKSt5arrayIfLm4EERKS_IfLm2EE.arch_znver1.4
> T _Z8multiplyRKSt5arrayIfLm4EERKS_IfLm2EE.arch_znver2.5
> T multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .default.6]
> i multiply(std::array<float, 4ul> const&, std::array<float, 2ul> const&) [clone .ifunc]
20d17
< r std::piecewise_construct

The request is that Clang emits targets in a way that the target attributes won't be necessary in the function declaration/prototype.

llvmbot commented 1 year ago

@llvm/issue-subscribers-c-1

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-codegen

MaskRay commented 1 year ago

Clang's current behavior is a result of the symbol difference: https://maskray.me/blog/2023-02-05-function-multi-versioning

I wonder whether GCC intentionally supports the use case. My may need a clarification from them.

HFTrader commented 1 year ago

GCC answered

Yes this is the correct behavior for this attribute.

Looks like it was not implemented the same way in clang, report it to them.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109047

https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Function-Attributes.html#index-target_005fclones-function-attribute

It also creates a resolver function (see the ifunc attribute above) that dynamically selects a clone suitable for current architecture. The resolver is created only if there is a usage of a function with target_clones attribute.

lawben commented 3 months ago

I just stumbled across this in a project that I'm working on (with Clang/LLVM 18). This is really not my area of expertise, but it feels like not exporting the "naked" symbol somewhat defeats the "I will resolve this for you" purpose of this feature. The symbol is created, but it is not exported (at least according to nm it is a local symbol). Currently, I can only use target_clones within the same compilation unit but not from a library, which I would like to do.

Is there any technical reason to not export the symbol? If GCC does this, and there is no technical reason to avoid doing it, not exporting the symbol feels more like a bug than a "missing feature/enhancement".

As nothing has happened here for over a year: is anybody working on this or is there a decision on how to move forward? I'm happy to help, as I have a specific use-case that needs exactly this feature.

jroelofs commented 2 months ago

cc @labrinea: I vaguely remember we talked about the corresponding problem for target_version, but I don't remember the resolution.

jroelofs commented 2 months ago
jroelofs commented 2 months ago

I think this has been fixed. @HFTrader's original example works now: https://godbolt.org/z/3hd4MrzsG

lawben commented 2 months ago

Sorry, guys. I accidentally hit a key combination in the wrong window that apparently closed the issue with a comment. That was not intended, please ignore 😅

lawben commented 2 months ago

I'm trying to run this on ARM, which seems to work if I inspect the output of nm in godbolt when changing the compiler. However, on my M1 Macbook, this still does not work. I guess this is related to .ifunc features. If I use target_clones in the prototype on M1, it works. But if I only put it in the .cpp file, it does not. Should this work or does this not work in non-linux?

tahonermann commented 2 months ago

I think this might have been fixed in December, 2023 by this commit.

jroelofs commented 2 months ago

However, on my M1 Macbook, this still does not work.

What version of the toolchain? FMV and ifunc support would only be in the developer beta we shipped last week, or in an open source toolchain you've built yourself.

jroelofs commented 2 months ago

I've updated the godbolt example to ARM and it does not generate any assembly. I'm not sure if I'm messing something up or not.

Looks like a godbolt bug to me.

jroelofs commented 2 months ago

reported here: https://github.com/compiler-explorer/compiler-explorer/discussions/2763#discussioncomment-9830228

lawben commented 2 months ago

What version of the toolchain? FMV and ifunc support would only be in the developer beta we shipped last week, or in an open source toolchain you've built yourself.

I'm running LLVM 18.1.4 installed via brew (Homebrew clang version 18.1.4 | Target: arm64-apple-darwin21.6.0). From the godbolt examples and the issue dates, it looks like this is available in Clang/LLVM 18. I'm not sure which developer beta you are referring to in this case. Were there recent changes that affect this? If so, I'll try to build the current trunk.

I'll try to create a MWE for my use case and see what the exact issue is, as it involves templates and explicit instantiation etc. There is a non-zero chance that I'm simply messing something up in the process.

jroelofs commented 2 months ago

I'm not sure which developer beta you are referring to in this case

The toolchain in the Xcode 16 beta has support for FMV and ifuncs, Xcode 15 does not.

https://download.developer.apple.com/Developer_Tools/Xcode_16_beta/Xcode_16_beta.xip

lawben commented 2 months ago

I've figured out that I can get the exported symbol on my Macbook if I manually set the target to aarch64-unknown-linux-gnu (I guess the gnu suffix is important here for the ifunc logic). But then everything else breaks. I guess I'll just wait for Xcode 16. Thanks for the info!

jroelofs commented 2 months ago

I guess the gnu suffix is important here for the ifunc logic

Before https://github.com/llvm/llvm-project/pull/73686, only ELF targets supported it.