simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.28k stars 237 forks source link

Don't use _Float16 on architectures that do not support it #1182

Closed mcatanzaro closed 2 months ago

mcatanzaro commented 2 months ago

_Float16 is only supported on a few architectures. Let's assume it's not supported unless we know otherwise.

References:

mcatanzaro commented 2 months ago

So most of the CI results are in and looking good (except for the "default" check, which isn't actionable because there are no build logs available). But I'm marking this as draft because I've realized the current check isn't ideal. With this change, _Float16 will no longer ever be enabled on MIPS or Power. That's good, because it's not supported on those architectures, but now I should simplify the conditionals further. It might also be possible to simplify the RISC-V check.

Basically my general idea here is: only enable _Float16 where we know it actually works. If that's a bad idea, please complain!

mcatanzaro commented 2 months ago

OK, I'm going to push a new revision that additionally removes the checks for MIPS and POWER, since they should no longer be needed as a result of my changes here. I expect this should be safe.

There are two remaining problems with the checks here, which I don't know how to fix and will not touch:

(1) The checks for RISC-V are strange. The first check semantically means "don't use _Float16 on RISC-V if compiling with Clang" (because it's broken, see 3f4ea16eb62a5ead4a8349cad770f52af0bd46d3). But the second check enables it for the ZVFH subarchitecture regardless of compiler. This seems odd and should be reconciled somehow.

(2) _Float16 is never used if compiling C++ with GCC on aarch64. But it is used if compiling C, or if compiling with Clang. This seems strange. This exclusion was added in 89100575366ad8a82a152cbb7ba3dc9f46689884.

mr-c commented 2 months ago

Thank you for looking into this, @mcatanzaro !

With this change, _Float16 will no longer ever be enabled on MIPS or Power. That's good, because it's not supported on those architectures,

What is the evidence that GCC doesn't support float16 on MIPS nor Power?

https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html says

In cases where hardware support is not specified, GCC implements conversions between __fp16 and other types as library calls.

--

(1) The checks for RISC-V are strange. The first check semantically means "don't use _Float16 on RISC-V if compiling with Clang" (because it's broken, see 3f4ea16). But the second check enables it for the ZVFH subarchitecture regardless of compiler. This seems odd and should be reconciled somehow.

@eric900115 , do you have evidence that _Float16 works on clang for RISC-V if __riscv_zvfh is defined?

(2) _Float16 is never used if compiling C++ with GCC on aarch64. But it is used if compiling C, or if compiling with Clang. This seems strange. This exclusion was added in 8910057.

Perhaps there is a minimum version of g++ that works with _Float16 on aarch64; can you do some testing and find out further?

In general, I prefer to exclude known-broken compiler version + architecture combinations, as documentation of minimum versions and then rely on bug reports where new discoveries are made; rather than only enable native fp16 where we have testing which would mean that users would miss out on the fp16 support in situations we didn't test for.

mcatanzaro commented 2 months ago

What is the evidence that GCC doesn't support float16 on MIPS nor Power?

It's on the previous page:

"The _Float16 type is supported on AArch64 systems by default, on ARM systems when the IEEE format for 16-bit floating-point types is selected with -mfp16-format=ieee and, for both C and C++, on x86 systems with SSE2 enabled. "

So it's not available by default, and the allowlisted architectures are just the ones listed. If you prefer I can go back to the denylist approach and just exclude s390x, but you'll probably keep receiving patches to exclude other architectures.

Perhaps there is a minimum version of g++ that works with _Float16 on aarch64; can you do some testing and find out further?

Hm, that sounds like a lot of work, sorry. I'm sure I could emulate aarch64 with qemu, but I've never done that before.

mr-c commented 2 months ago

What is the evidence that GCC doesn't support float16 on MIPS nor Power?

It's on the previous page:

"The _Float16 type is supported on AArch64 systems by default, on ARM systems when the IEEE format for 16-bit floating-point types is selected with -mfp16-format=ieee and, for both C and C++, on x86 systems with SSE2 enabled. "

I think that page is out of date. For example, RISC-V has supported _Float16 since GCC 13 https://github.com/gcc-mirror/gcc/commit/27d68a60783b52504a08503d3fe12054de104241 (a.k.a https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=27d68a60783b52504a08503d3fe12054de104241 )

So it's not available by default, and the allowlisted architectures are just the ones listed. If you prefer I can go back to the denylist approach and just exclude s390x, but you'll probably keep receiving patches to exclude other architectures.

I prefer the denylist, yes. Happy to get more patches. Users can workaround this without a patch by defining SIMDE_FLOAT16_API with the appropriate value before including any SIMDe header.

Perhaps there is a minimum version of g++ that works with _Float16 on aarch64; can you do some testing and find out further?

Hm, that sounds like a lot of work, sorry. I'm sure I could emulate aarch64 with qemu, but I've never done that before.

https://godbolt.org/ has a variety of arm64 gcc versions

mcatanzaro commented 2 months ago

OK, here's a new version that just excludes s390x and i386. (The disadvantage is this makes the condition even more complex.)

https://godbolt.org/ has a variety of arm64 gcc versions

Huh, godbolt is really good.

Seems the check was actually correct. _Float16 is supported in C since GCC 7 and in C++ only since GCC 13. That's odd, but OK. I'll add another commit to make it available if using GCC 13 or newer.