Open balay opened 1 year ago
tried the following change and still get errors:
diff --git a/var/spack/repos/builtin/packages/xsdk/package.py b/var/spack/repos/builtin/packages/xsdk/package.py
index b52d692b78..629f240a8f 100644
--- a/var/spack/repos/builtin/packages/xsdk/package.py
+++ b/var/spack/repos/builtin/packages/xsdk/package.py
@@ -109,8 +109,8 @@ class Xsdk(BundlePackage, CudaPackage, ROCmPackage):
variant("hiop", default=True, description="Enable hiop build")
variant("raja", default=(sys.platform != "darwin"), description="Enable raja for hiop, exago")
- xsdk_depends_on("hypre@develop+superlu-dist+shared", when="@develop", cuda_var="cuda")
- xsdk_depends_on("hypre@2.30.0+superlu-dist+shared", when="@1.0.0", cuda_var="cuda")
+ xsdk_depends_on("hypre@develop+superlu-dist+shared", when="@develop", cuda_var="cuda", rocm_var="rocm")
+ xsdk_depends_on("hypre@2.30.0+superlu-dist+shared", when="@1.0.0", cuda_var="cuda", rocm_var="rocm")
xsdk_depends_on("hypre@2.26.0+superlu-dist+shared", when="@0.8.0", cuda_var="cuda")
xsdk_depends_on("hypre@2.23.0+superlu-dist+shared", when="@0.7.0", cuda_var="cuda")
We need to merge PR https://github.com/hypre-space/hypre/pull/869 into hypre's master to make hypre +rocm +superlu-dist
work
Related: https://github.com/xsdk-project/xsdk-issues/issues/225
Oops, now I notice that the issue here was hypre ~rocm +superlu-dist
I'm wondering why SuperLU_dist is defining GPU stuff while this build isn't supposed to use GPUs:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/openmpi-4.1.6-726wgocxnrdcpvhnyctkrwk5brwxpble/bin/mpicc -O2 -fPIC -DHAVE_CONFIG_H -I.. -I../distributed_ls/Euclid -I. -I./.. -I./../blas -I./../lapack -I./../multivector -I./../utilities -I./../krylov -I./../seq_mv -I./../parcsr_mv -I./../distributed_matrix -I./../matrix_matrix -I./../IJ_mv -I./../parcsr_block_mv -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/openmpi-4.1.6-726wgocxnrdcpvhnyctkrwk5brwxpble/include -c par_nodal_systems.c
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26,
from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104,
from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37,
from par_amg.c:18:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
151 | #define gpuEvent_t hipEvent_t
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:128:5: note: in expansion of macro 'gpuEvent_t'
128 | gpuEvent_t *GemmStart, *GemmEnd, *ScatterEnd; /*GPU events to store gemm and scatter's begin and end*/
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
151 | #define gpuEvent_t hipEvent_t
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:129:5: note: in expansion of macro 'gpuEvent_t'
129 | gpuEvent_t *ePCIeH2D;
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
151 | #define gpuEvent_t hipEvent_t
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:130:5: note: in expansion of macro 'gpuEvent_t'
130 | gpuEvent_t *ePCIeD2H_Start;
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
151 | #define gpuEvent_t hipEvent_t
| ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:131:5: note: in expansion of macro 'gpuEvent_t'
131 | gpuEvent_t *ePCIeD2H_End;
| ^~~~~~~~~~
Related: https://github.com/xsdk-project/xsdk-issues/issues/225
Ah, sorry for creating a duplicate issue. We can close this one [if needed]
I'm wondering why SuperLU_dist is defining GPU stuff while this build isn't supposed to use GPUs:
Yeah - Ideally we should have both superlu-dist+rocm
and hypre+rocm
Current mode of superlu-dist+rocm hypre~rocm is a carry-over from prior xsdk release.
Ideally we should have both superlu-dist+rocm and hypre+rocm
Ah ok, I see! We will have that with the hypre PR I mentioned
Ok - adding 'hypre+rocm' to 'xsdk@1.0.0' now [so this is the mode that will get tested].
We need to merge PR https://github.com/hypre-space/hypre/pull/869 into hypre's master to make hypre +rocm +superlu-dist work
@victorapm , I tried building with the above change [i.e use dsuperlu
branch instead of master
branch] - I still see build failures
In file included from dsuperlu.c:12:
In file included from ./dsuperlu.h:11:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:108:10: fatal error: 'hipblas.h' file not found
#include "hipblas.h"
^~~~~~~~~~~
Hm - maybe its an issue with hypre [or superlu-dist?] spec in spack wrt rocm dependencies..
cc: @xiaoyeli @liuyangzhuan
Thanks for the feedback!
hypre links to rocblas with the rocm build. It seems superlu_dist needs hipblas? We could add this as an additional LDFLAGS maybe?
This gets the hypre+rocm
build going for me
diff --git a/var/spack/repos/builtin/packages/hypre/package.py b/var/spack/repos/builtin/packages/hypre/package.py
index ede99fafcc..5364a3bb73 100644
--- a/var/spack/repos/builtin/packages/hypre/package.py
+++ b/var/spack/repos/builtin/packages/hypre/package.py
@@ -24,7 +24,7 @@ class Hypre(AutotoolsPackage, CudaPackage, ROCmPackage):
test_requires_compiler = True
version("develop", branch="master")
- version("2.30.0", branch="master")
+ version("2.30.0", branch="dsuperlu")
version("2.29.0", sha256="98b72115407a0e24dbaac70eccae0da3465f8f999318b2c9241631133f42d511")
version("2.28.0", sha256="2eea68740cdbc0b49a5e428f06ad7af861d1e169ce6a12d2cf0aa2fc28c4a2ae")
version("2.27.0", sha256="507a3d036bb1ac21a55685ae417d769dd02009bde7e09785d0ae7446b4ae1f98")
@@ -108,6 +108,7 @@ def patch(self): # fix sequential compilation in 'src/seq_mv'
depends_on("rocthrust", when="+rocm")
depends_on("rocrand", when="+rocm")
depends_on("rocprim", when="+rocm")
+ depends_on("hipblas", when="+rocm")
depends_on("umpire", when="+umpire")
depends_on("caliper", when="+caliper")
@@ -258,7 +259,7 @@ def configure_args(self):
configure_args.append("--disable-cub")
if "+rocm" in spec:
- rocm_pkgs = ["rocsparse", "rocthrust", "rocprim", "rocrand"]
+ rocm_pkgs = ["rocsparse", "rocthrust", "rocprim", "rocrand", "hipblas"]
rocm_inc = ""
for pkg in rocm_pkgs:
if "^" + pkg in spec:
Looks like this issue will be with all pkgs that use superlu-dist.
@xiaoyeli can this dependency [on hipblas.h] be avoided from public include files? [assuming its primarily required in superlu-dist sources]
Maybe we need to incorporate this into hypre's configure/CMakeLists for folks not building it via spack. The spack fix wouldn't be necessary then (although much appreciated!)
can this dependency [on hipblas.h] be avoided from public include files
This sounds great if possible :)
@xiaoyeli I get the following with petsc [this warning breaks the build]
stderr:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:108,
from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26,
from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104,
from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37,
from /tmp/petsc-xuiyjmih/config.headers/conftest.cc:3:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas.h:16:2: warning: #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkr\
wbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>" [-Wcpp]
16 | #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>"
| ^~~~~~~
Source:
#include "confdefs.h"
#include "conffix.h"
#include <superlu_ddefs.h>
I guess this should go into a "new" issue..
@xiaoyeli I get the following with petsc [this warning breaks the build]
stderr: In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:108, from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26, from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104, from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37, from /tmp/petsc-xuiyjmih/config.headers/conftest.cc:3: /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas.h:16:2: warning: #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkr\ wbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>" [-Wcpp] 16 | #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>" | ^~~~~~~ Source: #include "confdefs.h" #include "conffix.h" #include <superlu_ddefs.h>
I guess this should go into a "new" issue..
Fixed in https://github.com/xsdk-project/xsdk-issues/issues/236#event-10648868613
This gets the hypre+rocm build going for me
The updated change(for hypre+rocm with superlu-dist+rocm) is now at https://github.com/spack/spack/pull/40980
I thought this build was successful last week [but don't know for sure]
ref:
./bin/spack install -j64 xsdk+rocm amdgpu_target=gfx90a
spack-build-out.txt