xsdk-project / xsdk-issues

A repository under which GitHub issues not related to a specific xSDK repo can be filed.
7 stars 0 forks source link

ExaGO build failure on Frontier with PrgEnv-gnu #226

Closed balos1 closed 11 months ago

balos1 commented 12 months ago

ExaGO fails when building xsdk with rocm and PrgEnv-gnu on Frontier:

[ 97%] Linking CXX executable opflow
cd /tmp/balos1/spack-stage/spack-stage-exago-1.6.0-lv7yzk2izzpxqifpvjibua2mxrhaptuj/spack-build-lv7yzk2/applications && /autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.27.6-z7eq7nsiawlbjtf5feyxitpwragn33wf/bin/cmake -E cmake_link_script CMakeFiles/app_opflow.dir/link.txt --verbose=1
/opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/bin/mpicxx -O3 -DNDEBUG -fopenmp CMakeFiles/app_opflow.dir/opflow_main.cpp.o -o opflow  -Wl,-rpath,/autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/umpire-6.0.0-vzfkkasbfj4rbwgl4ofspykgh35cdzsw/lib:/autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/raja-0.14.0-goi63ekobgg7dahilgvn2tft6d3xlp6s/lib:/autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/petsc-3.20.0-6ursdqegoopzhpqzgwro2o4rlioz5kfi/lib:/autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/openblas-0.3.24-76epib4val2vdqe5f2ftpaw6qxxs4xyb/lib::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ../src/opflow/libexago_opflow.a /autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/hiop-1.0.0-j3244ymuunmgwnwn53i7ghwaz5a7pfhe/lib/libhiop.a /autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/umpire-6.0.0-vzfkkasbfj4rbwgl4ofspykgh35cdzsw/lib/libumpire.so /autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/raja-0.14.0-goi63ekobgg7dahilgvn2tft6d3xlp6s/lib/libRAJA.so -ldl /opt/cray/pe/gcc/11.2.0/snos/lib64/libgomp.so /usr/lib64/libpthread.so ../src/pflow/libexago_pflow.a ../src/ps/libexago_ps.a ../src/utils/libexago_utils.a /autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/petsc-3.20.0-6ursdqegoopzhpqzgwro2o4rlioz5kfi/lib/libpetsc.so /autofs/nccs-svm1_proj/csc326/balos1/frontier/spack-xsdk/opt/spack/linux-sles15-zen3/gcc-11.2.0/openblas-0.3.24-76epib4val2vdqe5f2ftpaw6qxxs4xyb/lib/libopenblas.so 
/usr/bin/ld: ../src/opflow/libexago_opflow.a(opflowregi.cpp.o): in function `OPFLOWModelRegisterAll(_p_OPFLOW*)':
opflowregi.cpp:(.text+0x2c8): undefined reference to `OPFLOWModelCreate_PBPOLRAJAHIOPSPARSE(_p_OPFLOW*)'
collect2: error: ld returned 1 exit status
make[2]: *** [applications/CMakeFiles/app_opflow.dir/build.make:111: applications/opflow] Error 1
make[2]: Leaving directory '/tmp/balos1/spack-stage/spack-stage-exago-1.6.0-lv7yzk2izzpxqifpvjibua2mxrhaptuj/spack-build-lv7yzk2'
make[1]: *** [CMakeFiles/Makefile2:555: applications/CMakeFiles/app_opflow.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Full log: spack-build-out.txt

cameronrutherford commented 11 months ago

@abhyshr and @wperkins this error is occuring with a build of exago+mpi+python+ipopt~rocm~cuda. Key here is ~ipopt means https://github.com/pnnl/ExaGO/blob/828db06af5e0345b641b6d03ef7e2456d13469ea/src/opflow/interface/opflowregi.cpp#L79 and https://github.com/pnnl/ExaGO/blob/828db06af5e0345b641b6d03ef7e2456d13469ea/src/opflow/model/power_bal_hiop/pbpolrajahiopsparse.cpp#L3 end up being guarded by different compiler macros.

@balos1 I am curious why you are building on Frontier without HIP, but thank you for catching this edge case.

cameronrutherford commented 11 months ago

We also don't print this particular bit of information at configure time (we should), but I assume hiop~sparse+raja was the hiop configuration built, and so minimal spack spec to repro on any platform is exago+hiop+raja~ipopt ^hiop+raja~sparse

balos1 commented 11 months ago

@cameronrutherford I was not attempting to build without rocm. It looks like doing xsdk+rocm did not propagate the rocm variant to exago. Ill fix that.

cameronrutherford commented 11 months ago

https://github.com/pnnl/ExaGO/pull/41 should close the original issue. Feel free to open new one with +rocm and close this IMO