Open lfmeadow opened 1 year ago
Are there errors when std:: is replaced with sycl:: ?
It links with sycl::. I'll see if I can patch LAMMPS. This is gonna be a real pain, the way Kokkos is done. I can hack it but I don't know how to do it properly. Isn't this just a compiler bug?
I took page from Ruyman //github.com/Ruyk/lammps.git commit d74d7cfd5f1aedf9dfad57b8b3412802fbb3263f and just brute forced sin, cos, pow, exp, and sqrt to use the Kokkos::Experimental namespace. So I guess this issue has always been there. It seems very onerous for the user.
The same issue was reported in https://github.com/intel/llvm/issues/7344
reduced. I'll take a look at this
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
define dso_local void @foo(double* %f) {
entry:
%0 = call double @llvm.sin.f64(double 0x7FF8000000000000)
store double %0, double* %f, align 8
ret void
}
declare double @llvm.sin.f64(double) #0
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
SYCL with HIP support sees similar issues. Thanks.
This is pretty interesting:
CUDA doesn't generate code that would contain @llvm.sin.f64
by the time it gets to the backend.
However, in normal C++ (and I'm including SYCL in that definition) most of the mathematical functions will end up being lowered to an llvm intrinsic (for sin
, it's @llvm.sin.f(32|64)
). The NVPTX backend has a couple of instruction definitions for these ISD nodes but it seems that there are significant gaps in the IselLowering implementation, and lots of missing patterns.
There's even this strange loop:
for (const auto &Op :
{ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS, ISD::FABS}) {
...
setOperationAction(Op, MVT::f64, Legal);
...
This loop tells the ISel Lowering that f64 is supported in hardware for sin
, cos
and abs
ISD opcodes (among others). However, there is no PTX instruction that matches that behaviour because the PTX ISA only supports sin
for .f32
. Of course there's no tablegen pattern, or custom lowering defined because the PTX assembler would rightly flip.
I've started collating a list of the maths functions that should generate direct PTX ISA instructions, and those that require lowering to libcall, both dependent on whether we accept lower precision -ffast-math
I'll post a patch to upstream llvm soon.
Thanks for the great reproducer, @lfmeadow
Describe the bug The -ffast-math switch results in backend failures and/or llvm link failures when using some double precision std::math intrinsics. This was discovered compiling LAMMPS with Kokkos using SYCL for CUDA.
To Reproduce
If std::exp is called then a different message appears.
Environment (please complete the following information):
linux nvidia A100 CUDATOOLKIT_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7 on Perlmutter.
Additional context Add any other context about the problem here.