__FLOAT128__ macro incorrectly defined for CUDA target

llvmbot commented 4 years ago


Bugzilla Link	47559
Version	10.0
OS	Linux
Reporter	LLVM Bugzilla Contributor
CC	@Artem-B

Extended Description

The CUDA target does not support float128, but Clang defines FLOAT128__ regardless. This breaks feature detection in libstdc++, causing this trivial invocation to fail:

/usr/bin/clang++ -x cuda --cuda-gpu-arch=sm_75 -c -std=gnu++17 /dev/null Output:

In file included from :1: In file included from /usr/lib/clang/10.0.1/include/clang_cuda_runtime_wrapper.h:36: In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/cmath:47: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/std_abs.h:103:7: error: float128 is not supported on this target abs(float128 x)

It appears that there is a check missing in clang/lib/Basic/Targets/OSTargets.h, which should not set HasFloat128 when compiling for CUDA.

Artem-B commented 3 years ago

Storage-only type might be a reasonable compromise. We will not be able to generate working code for everything (as in your example with int128), but it would at least allow us to avoid divergence from the host during parsing.

I'll look into it when I have time, but it probably will not happen for a while.

llvmbot commented 3 years ago

Depending on the implementation complexity, a way around this might be to support the float128 representation in device code, but not the actual operations. int128 sets an (unintentional?) precedent for this: Loading, storing and adding __int128s works, but attempting a division fails in the backend:

fatal error: error in backend: Undefined external symbol "__divti3"
clang-11: error: clang frontend command failed with exit code 70 (use -v to see invocation)

Now the error message is of course less descriptive than "int128 not supported", but I think this partial support is actually useful. Imagine a kernel sorting an array of structs containing float128 members, but the order does not depend on those floats. The swap operations would still be well-defined.

llvmbot commented 3 years ago

This might be a separate bug, but the -mno-float128 flag is ignored entirely (C, C++ and CUDA):

$ clang -mno-float128 -x c -dM -E /dev/null | grep FLOAT128__ | sort | uniq
$ clang -mno-float128 -x c++ -dM -E /dev/null | grep FLOAT128__ | sort | uniq
$ clang -mno-float128 -x cuda -dM -E /dev/null | grep FLOAT128__ | sort | uniq

all print

clang-11: warning: argument unused during compilation: '-mno-float128' [-Wunused-command-line-argument]
#define __FLOAT128__ 1
#define __SIZEOF_FLOAT128__ 16

and the example from above still fails. The only reliable workaround I've found is

-U__FLOAT128__ -U__SIZEOF_FLOAT128__

But that, of course, has the ODR problems you mentioned.

Artem-B commented 3 years ago

TL;DR; Everyone compiling projects with CUDA may need to use -mno-float128.

The question is -- how it all should work in the end?

Given:

Host-side compilation does support float128
GPU-side compilation does not support float128.
we want preprocessed code to look the same for host/GPU compilations of the same TU.
we want to maintain consistent set of features across all host TUs in the build and that includes consistency with C++ compilations.

IMO, the only ways to fix the problem is to either add support for float128 on GPU or disable float128 for all compilations at the build level. The former is unlikely to happen any time soon. The latter has do be done by the individual end user in their build.

If we only disable float128 for CUDA compilations, that would help maintaining host/GPU consistency, but would leave us open to issues in mixed C++/CUDA apps, unless the build is changed to make sure float128 is disabled everywhere.

Another downside to disabling float128 in clang for CUDA only is that it will make it very hard to spot and deal with issues caused by C++/CUDA mismatch in supported features. I'd rather fail in a painfully obvious way rather than make compilation work at the price of hard to debug issues in other places. Chasing ODR violations is not fun at all.

llvmbot commented 3 years ago

The error message for the command line in the OP might depend on the standard library version installed. The essence of the problem can be reproduced with this code:

#ifdef __FLOAT128__
extern __float128 f;
#endif

Compiling this with clang -x cuda produces

/tmp/float128.cu:2:8: error: __float128 is not supported on this target
extern __float128 f;
    ^
1 error generated when compiling for sm_61.

Even though the declaration should have been removed by the preprocessor, but FLOAT128 and __SIZEOF_FLOAT128__ are incorrectly defined:

$ clang -x cuda -dM -E /dev/null --cuda-gpu-arch=sm_75 | grep FLOAT128 | sort | uniq
#define _GLIBCXX_USE_FLOAT128 1
#define __FLOAT128__ 1
#define __HAVE_DISTINCT_FLOAT128 0
#define __HAVE_DISTINCT_FLOAT128X __HAVE_FLOAT128X
#define __HAVE_FLOAT128 0
#define __HAVE_FLOAT128X 0
#define __HAVE_FLOAT128_UNLIKE_LDBL (__HAVE_DISTINCT_FLOAT128 && __LDBL_MANT_DIG__ != 113)
#define __LDOUBLE_REDIRECTS_TO_FLOAT128_ABI 0
#define __SIZEOF_FLOAT128__ 16

Not sure if affects both host and device compilation.

Clang:

clang version 11.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix

CUDA: Both 10.1 (latest officially supported by Clang) as well as 11.2.

Artem-B commented 4 years ago

Which CUDA version are you using? Which compilation do you have in mind when you say 'compiling for CUDA'? Host compilation? Device? Both?

I can not reproduce the issue with recent clang:

% bin/clang++ -x cuda --cuda-path=$HOME/local/cuda-10.1 --cuda-gpu-arch=sm_75 -c -std=gnu++17 /dev/null % bin/clang --version clang version 12.0.0 Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/local/build/release/bin

llvmbot commented 4 years ago

assigned to @Artem-B

Artem-B commented 9 months ago

https://github.com/llvm/llvm-project/commit/318bff6811e7a7e0d3295ed85aa3ad01d475cc8c should work around the issue by issuing the diagnostics only if __fp128 is actually used on the GPU side.

llvm / llvm-project