idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.68k stars 1.03k forks source link

HPC MOOSE build: referencing modular dependencies and error compiling EigneProblem.C #26113

Open phi-l-l-ip-thomas opened 8 months ago

phi-l-l-ip-thomas commented 8 months ago

Hi MOOSE development team,

I would like to build MOOSE from scratch to target a large HPC cluster. I am running into an issue with linking to dependencies when I attempt to install libmesh. The dependencies are all present on my system, but they are located in modules rather than in "default" locations. I have a workaround for the libmesh issue described below, but I am curious if there is a more elegant way to achieve this (and I am not sure that my workaround does not introduce a later problem -- see below -- when building MOOSE itself). Here are my build steps:

module load cmake
module load python
module load cpu
module load cray-hdf5-parallel
module load cray-pmi

After these steps the following modules are loaded (note the versions of items 4 and 18 in the list below):

Currently Loaded Modules:
  1) craype-x86-milan
  2) libfabric/1.15.2.0
  3) craype-network-ofi
  4) xpmem/2.6.2-2.5_2.27__gd067c3f.shasta
  5) PrgEnv-gnu/8.3.3
  6) cray-dsmml/0.2.2
  7) cray-libsci/23.02.1.1
  8) cray-mpich/8.1.25
  9) craype/2.7.20
 10) gcc/11.2.0
 11) perftools-base/23.03.0
 12) cpe/23.03
 13) xalt/2.10.2
 14) cpu/1.0
 15) cmake/3.22.0                          (buildtools)
 16) cray-hdf5-parallel/1.12.2.3           (io)
 17) e4s/22.11                             (buildtools)
 18) cray-pmi/6.1.10
 19) evp-patch
 20) python/3.9-anaconda-2021.11           (dev)

Next I set the environment variables:

# Clone the repository
git clone https://github.com/idaholab/moose.git
cd moose
git checkout master

# Set environment variables
export CC=mpicc CXX=mpicxx FC=mpif90 F90=mpif90 F77=mpif77
export MOOSE_JOBS=6 METHODS=opt

PETSc is already installed on our system via a spack module, so I load this via:

module load e4s
spack env activate -V gcc
spack load petsc

Now when I attempt to build libmesh via:

./update_and_rebuild_libmesh.sh

, I receive the following errors:

libtool: warning: '/opt/cray/pe/gcc/11.2.0/snos/lib/gcc/x86_64-suse-linux/11.2.0/../../../../lib64/libstdc++.la' seems to be moved
/usr/bin/ld: cannot find -lxpmem: No such file or directory
/usr/bin/ld: cannot find -lpmi: No such file or directory
/usr/bin/ld: cannot find -lpmi2: No such file or directory

I noticed in ~/moose/libmesh/build/Makefile that the configure script found the following locations for the xpmem and pmi libraries:

/opt/cray/xpmem/2.5.2-2.4_3.20__gd0f7936.shasta/lib64
/opt/cray/pe/pmi/6.1.7/lib

My workaround is to hack the Makefile to target where newer versions of these libraries reside on my system:

sed -i -e 's/2.5.2-2.4_3.20__gd0f7936/2.6.2-2.5_2.27__gd067c3f/g' Makefile
sed -i -e 's/6.1.7/6.1.10/g' Makefile

After the libmesh Makefile hack, I can successfully build libmesh:

cd ~/moose/scripts
./update_and_rebuild_libmesh.sh --fast

Is there a way to direct the configure script of find the newer xpmem and pmi dependency libraries to avoid a need for the hack above?

Once libmesh is built, I continue by building WASP (successfully), followed by MOOSE itself:

./update_and_rebuild_wasp.sh
cd ../test
make -j 6

My attempt to build MOOSE itself fails when trying to compile Eigenproblem.C; this gives the following error message:

In file included from /pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build/unity_src/problems_Unity.C:4:
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C: In constructor 'EigenProblem::EigenProblem(const InputParameters&)':
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:53:5: error: class 'EigenProblem' does not have any field named '_n_eigen_pairs_required'
   53 |     _n_eigen_pairs_required(1),
      |     ^~~~~~~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:54:5: error: class 'EigenProblem' does not have any field named '_generalized_eigenvalue_problem'
   54 |     _generalized_eigenvalue_problem(false),
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:55:5: error: class 'EigenProblem' does not have any field named '_negative_sign_eigen_kernel'
   55 |     _negative_sign_eigen_kernel(getParam<bool>("negative_sign_eigen_kernel")),
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:56:5: error: class 'EigenProblem' does not have any field named '_active_eigen_index'
   56 |     _active_eigen_index(getParam<unsigned int>("active_eigen_index")),
      |     ^~~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:57:5: error: class 'EigenProblem' does not have any field named '_do_free_power_iteration'
   57 |     _do_free_power_iteration(false),
      |     ^~~~~~~~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:58:5: error: class 'EigenProblem' does not have any field named '_output_inverse_eigenvalue'
   58 |     _output_inverse_eigenvalue(false),
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:59:5: error: class 'EigenProblem' does not have any field named '_on_linear_solver'
   59 |     _on_linear_solver(false),
      |     ^~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:60:5: error: class 'EigenProblem' does not have any field named '_matrices_formed'
   60 |     _matrices_formed(false),
      |     ^~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:61:5: error: class 'EigenProblem' does not have any field named '_constant_matrices'
   61 |     _constant_matrices(false),
      |     ^~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:62:5: error: class 'EigenProblem' does not have any field named '_has_normalization'
   62 |     _has_normalization(false),
      |     ^~~~~~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:63:5: error: class 'EigenProblem' does not have any field named '_normal_factor'
   63 |     _normal_factor(1.0),
      |     ^~~~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:64:5: error: class 'EigenProblem' does not have any field named '_first_solve'
   64 |     _first_solve(declareRestartableData<bool>("first_solve", true)),
      |     ^~~~~~~~~~~~
/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/src/problems/EigenProblem.C:65:5: error: class 'EigenProblem' does not have any field named '_bx_norm_name'
   65 |     _bx_norm_name(isParamValid("bx_norm")
      |     ^~~~~~~~~~~~~
Compiling C++ (in opt mode) /pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build/unity_src/outputs_formatters_Unity.C...
Compiling C++ (in opt mode) /pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build/unity_src/nodalkernels_Unity.C...
Compiling C++ (in opt mode) /pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build/unity_src/meshdivisions_Unity.C...
Compiling C++ (in opt mode) /pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build/unity_src/linesearches_Unity.C...
make: *** [/pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build.mk:150: /pscratch/sd/p/pthomas/tickets/INC0200703/moose/framework/build/unity_src/problems_Unity.x86_64-pc-linux-gnu.opt.lo] Error 1
make: *** Waiting for unfinished jobs....

If you have any advice on how to solve this, then I would be very grateful! Many thanks!

YaqiWang commented 7 months ago

Looks like you PETSc does not have SLEPc. But this error is due to missing some #ifdef LIBMESH_HAVE_SLEPC in EigenProblem of MOOSE. Tag @lindsayad

phi-l-l-ip-thomas commented 7 months ago

Hi @YaqiWang, thank you for the response! I just restarted the build from the beginning but this time I ran the script ./update_and_rebuild_petsc.sh instead of loading the e4s version via spack. This succeeded in finding the correct xpmem and pmi libraries on my system. After building petsc, libmesh and WASP, I was also able to build MOOSE without running into the issue with compiling EigenProblem.C that I experienced earlier.

However, now that MOOSE is built, when I run the tests, I immediately get a segmentation fault with the following error message:

moose/test> ./run_tests -j 6
<frozen importlib._bootstrap>:241: RuntimeWarning: compile time version 3.9 of module 'hit' does not match runtime version 3.11
Segmentation fault

Do you have an idea how I can diagnose+fix this error? Many thanks again!

lindsayad commented 7 months ago

It looks like you compiled HIT with a python version of 3.9 but when running the tests you were running with a python version of 3.11. Such an environment mismatch could cause a segmentation fault. One you could test is to try running a test directly with the generated test executable to see whether the problem is isolated to python

phi-l-l-ip-thomas commented 7 months ago

Hi @linsayad, thank you for the hint! I was initially perplexed by this message until I realized that the default Python had been updated by our system admin between the time that I built most of the dependencies and the time that I ran MOOSE itself. I can now build the code and dependencies without error. Cheers!

One more question: when running the tests, many fail due to mpiexec not being directly available to users on our system -- instead we use Slurm's srun with the Cray MPICH wrappers. Is there a convenient way in MOOSE to globally set the MPI command to call our wrapper of choice?

lindsayad commented 7 months ago

It looks like you can set a MOOSE_MPI_COMMAND environment variable. Does srun take a -n argument?

phi-l-l-ip-thomas commented 7 months ago

Hi @lindsayad, yes, 'srun' can be called with a number of arguments (which can also be set as Slurm environment variables to avoid having to include them explicitly in the invocation), but the basic usage can be tailored down to an mpirun/mpiexec-like format:

srun -n <number-of-MPI-tasks> <executable>
lindsayad commented 7 months ago

In that case I think the MOOSE_MPI_COMMAND should be the solution. Let us know if it works