icl-utk-edu / slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).
https://icl.utk.edu/slate/
BSD 3-Clause "New" or "Revised" License
91 stars 21 forks source link

Broken hipBlas complex support #196

Closed devreal closed 3 months ago

devreal commented 3 months ago

Description

This code seems to be hipified CUDA code:

#ifdef __NVCC__

// CUDA doesn't provide real, imag, conj.
__host__ __device__ inline double real( hipDoubleComplex x ) { return x.x; }
__host__ __device__ inline float  real( hipFloatComplex  x ) { return x.x; }

__host__ __device__ inline double imag( hipDoubleComplex x ) { return x.y; }
__host__ __device__ inline float  imag( hipFloatComplex  x ) { return x.y; }

__host__ __device__ inline hipDoubleComplex conj( hipDoubleComplex x ) { return hipConj(x); }
__host__ __device__ inline hipFloatComplex  conj( hipFloatComplex  x ) { return hipConjf(x); }

#else

__host__ __device__ inline double real( rocblas_double_complex x ) { return x.real(); }
__host__ __device__ inline float  real( rocblas_float_complex  x ) { return x.real(); }

__host__ __device__ inline double imag( rocblas_double_complex x ) { return x.imag(); }
__host__ __device__ inline float  imag( rocblas_float_complex  x ) { return x.imag(); }

__host__ __device__ inline rocblas_double_complex conj( rocblas_double_complex x ) { return { x.real(), -x.imag() }; }
__host__ __device__ inline rocblas_float_complex  conj( rocblas_float_complex  x ) { return { x.real(), -x.imag() }; }

#endif

I don't understand why hipDoubleComplex is excluded here but it leads to a compiler error on Frontier:

/ccs/home/jschuchart/src/slate/slate/src/hip/device_util.hip.hh:198:16: error: no matching function for call to 'imag'
    double b = imag( x );
               ^~~~
/ccs/home/jschuchart/src/slate/slate/src/hip/device_util.hip.hh:111:35: note: candidate function not viable: no known conversion from 'hipDoubleComplex' (aka 'HIP_vector_type<double, 2>') to 'rocblas_double_complex' (aka 'rocblas_complex_num<double>') for 1st argument
__host__ __device__ inline double imag( rocblas_double_complex x ) { return x.imag(); }
                                  ^
/ccs/home/jschuchart/src/slate/slate/src/hip/device_util.hip.hh:112:35: note: candidate function not viable: no known conversion from 'hipDoubleComplex' (aka 'HIP_vector_type<double, 2>') to 'rocblas_float_complex' (aka 'rocblas_complex_num<float>') for 1st argument
__host__ __device__ inline float  imag( rocblas_float_complex  x ) { return x.imag(); }
                                  ^
/ccs/home/jschuchart/src/slate/slate/src/hip/device_util.hip.hh:129:36: note: candidate function not viable: no known conversion from 'hipDoubleComplex' (aka 'HIP_vector_type<double, 2>') to 'double' for 1st argument
__host__ __device__  inline double imag( double x ) { return 0; }
                                   ^
/ccs/home/jschuchart/src/slate/slate/src/hip/device_util.hip.hh:130:36: note: candidate function not viable: no known conversion from 'hipDoubleComplex' (aka 'HIP_vector_type<double, 2>') to 'float' for 1st argument
__host__ __device__  inline float  imag( float  x ) { return 0; }
                                   ^

Steps To Reproduce

Build on Frontier with the environment below.

Environment

On Frontier:

$ module list

Currently Loaded Modules:
  1) craype-x86-trento    4) perftools-base/22.12.0                  7) cray-dsmml/0.2.2   10) cmake/3.23.2             13) craype/2.7.23          16) hsi/default        19) cray-libsci/23.12.5
  2) libfabric/1.15.2.0   5) xpmem/2.6.2-2.5_2.22__gd067c3f.shasta   8) PrgEnv-cray/8.3.3  11) rocm/5.7.1               14) cce/17.0.0             17) lfs-wrapper/0.0.1  20) cray-mpich/8.1.28
  3) craype-network-ofi   6) cray-pmi/6.1.8                          9) boost/1.79.0       12) craype-accel-amd-gfx90a  15) darshan-runtime/3.4.0  18) DefApps/default
lzjia-jia commented 3 months ago

Have your issues been resolved?

devreal commented 3 months ago

Looks like this issue is fixed, thank you.

mgates3 commented 3 months ago

The code makes more sense in the original CUDA file, src/cuda/device_util.cuh:

#ifdef __NVCC__

// CUDA doesn't provide real, imag, conj.
__host__ __device__ inline double real( cuDoubleComplex x ) { return x.x; }
__host__ __device__ inline float  real( cuFloatComplex  x ) { return x.x; }

__host__ __device__ inline double imag( cuDoubleComplex x ) { return x.y; }
__host__ __device__ inline float  imag( cuFloatComplex  x ) { return x.y; }

__host__ __device__ inline cuDoubleComplex conj( cuDoubleComplex x ) { return cuConj(x); }
__host__ __device__ inline cuFloatComplex  conj( cuFloatComplex  x ) { return cuConjf(x); }

#else

__host__ __device__ inline double real( rocblas_double_complex x ) { return x.real(); }
__host__ __device__ inline float  real( rocblas_float_complex  x ) { return x.real(); }

__host__ __device__ inline double imag( rocblas_double_complex x ) { return x.imag(); }
__host__ __device__ inline float  imag( rocblas_float_complex  x ) { return x.imag(); }

__host__ __device__ inline rocblas_double_complex conj( rocblas_double_complex x ) { return { x.real(), -x.imag() }; }
__host__ __device__ inline rocblas_float_complex  conj( rocblas_float_complex  x ) { return { x.real(), -x.imag() }; }

#endif

The first set of routines is what we use for CUDA. It gets translated by hipify to hipDoubleComplex, but we don't use that code. The second set of routines is what we use for HIP / ROCm.

Did you update your repo? There haven't been any recent changes in SLATE that would affect this, though it could depend on the ROCm version. Knowing the git commit ID would help to reproduce issues.