pnnl / ExaGO

High-performance power grid optimization for stochastic, security-constrained, and multi-period ACOPF problems.
Other
64 stars 8 forks source link

Buildsystem update for Frontier using the worldshared directory and rocm/5.6 #126

Closed nkoukpaizan closed 6 months ago

nkoukpaizan commented 6 months ago

Merge request type

Relates to

This MR updates

Summary

This MR updates the Spack configuration and the corresponding modules on Frontier to build with rocm/5.6. The modules are build in the project's world-shared directory. This replaces #89. Test failures remain and should be investigated.

pelesh commented 6 months ago

Are we building ExaGO with gcc or clang?

pelesh commented 6 months ago

When building with clang I get following link error:

[ 56%] Linking CXX executable opflow
ld.lld: error: undefined symbol: mc19ad_
>>> referenced by IpEquilibrationScaling.cpp
>>>               IpEquilibrationScaling.o:(Ipopt::EquilibrationScaling::DetermineScalingParametersImpl(Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::SymMatrixSpace const>, Ipopt::Matrix const&, Ipopt::Vector const&, Ipopt::Matrix const&, Ipopt::Vector const&, double&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&)) in archive /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/ipopt-3.12.10-7fp33q627rou44fzquk57llhwoqqeuho/lib/libipopt.a

It seems HSL is not found. This is for clean checkout of develop on Frontier. I can seeHSL module loaded:

$ ml

Currently Loaded Modules:
...
 34) coinhsl/2019.05.21-gcc-12.2.0-mixed-and6kty
 35) hipblas/5.6.0-clang-16.0.0-rocm5.6.0-mixed-pgkobjo

I used following command to build:

$ CC=clang CXX=clang++ FC=flang cmake ../exago
$ make

I'll investigate more.

pelesh commented 6 months ago

The issue I'm seeing looks more like a bug in ExaGO's CMake config. HSL does not seem to be on the linker line:

[ 61%] Linking CXX executable tcopflow
cd /ccs/home/peles/src/exago/build-crusher/applications && /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/cmake-3.20.6-cdzi5pgrngs2wwvhesbwkkjvsftoyqia/bin/cmake -E cmake_link_script CMakeFiles/app_tcopflow.dir/link.txt --verbose=1
/opt/rocm-5.6.0/llvm/bin/clang++ CMakeFiles/app_tcopflow.dir/tcopflow_main.cpp.o -o tcopflow  -Wl,-rpath,/lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/petsc-3.20.4-zgcdpbefalitop4iud527awwmbarfrsc/lib:/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/lib:/lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/gcc-12.2.0-mixed/openblas-0.3.20-cjm2rkdlesgzck7muxx34kwpr6d5rm7d/lib::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ../src/tcopflow/libexago_tcopflow.a ../src/opflow/libexago_opflow.a ../src/pflow/libexago_pflow.a ../src/ps/libexago_ps.a ../src/utils/libexago_utils.a /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/petsc-3.20.4-zgcdpbefalitop4iud527awwmbarfrsc/lib/libpetsc.so /opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/lib/libmpi_gnu_91.so /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/gcc-12.2.0-mixed/openblas-0.3.20-cjm2rkdlesgzck7muxx34kwpr6d5rm7d/lib/libopenblas.so -lpthread -lm -ldl /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/ipopt-3.12.10-7fp33q627rou44fzquk57llhwoqqeuho/lib/libipopt.a 
ld.lld: error: undefined symbol: mc19ad_
>>> referenced by IpEquilibrationScaling.cpp
>>>               IpEquilibrationScaling.o:(Ipopt::EquilibrationScaling::DetermineScalingParametersImpl(Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::SymMatrixSpace const>, Ipopt::Matrix const&, Ipopt::Vector const&, Ipopt::Matrix const&, Ipopt::Vector const&, double&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&)) in archive /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/ipopt-3.12.10-7fp33q627rou44fzquk57llhwoqqeuho/lib/libipopt.a

@cameronrutherford, please let me know if you can reproduce this issue.

nkoukpaizan commented 6 months ago

@pelesh I can reproduce with the build command you are using.

cmake -C ../buildsystem/clang-hip/cache.cmake ../exago; make seems to work, so it has to do with the CMAKE configuration and options. Some combinations (e.g., default options) seemingling don't work as expected.

pelesh commented 6 months ago

@pelesh I can reproduce with the build command you are using.

cmake -C ../buildsystem/clang-hip/cache.cmake ../exago; make seems to work, so it has to do with the CMAKE configuration and options. Some combinations (e.g., default options) seemingling don't work as expected.

I reproduced the same with the build system from develop, so it looks like an ExaGO bug unrelated to modules.

cameronrutherford commented 6 months ago

Merged - didn't debug your failing build, but I assume that there are some missing CMake options that aren't configured during that minimal build. It might also be a plain CMake bug in our ExaGO code, but I would have to debug more to know for sure