Open aprilnovak opened 4 months ago
I was able to ssh into Bitterrot, but upon opening my terminal, the ~/.bashrc
I have for Sawtooth complained
Lmod has detected the following error: The following module(s) are unknown: "openmpi/4.1.6-gcc-12.3.0-panw"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "openmpi/4.1.6-gcc-12.3.0-panw"
Also make sure that all modulefiles written in TCL start with the string #%Module
Lmod has detected the following error: The following module(s) are unknown: "cmake/3.27.7-gcc-12.3.0-5cfk"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "cmake/3.27.7-gcc-12.3.0-5cfk"
Also make sure that all modulefiles written in TCL start with the string #%Module
Lmod has detected the following error: The following module(s) are unknown: "gcc/12.3.0-gcc-10.5.0-vx2f"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "gcc/12.3.0-gcc-10.5.0-vx2f"
Also make sure that all modulefiles written in TCL start with the string #%Module
seems like those modules exist when I module avail
, but perhaps the syntax is causing an issue. Maybe I should remove this from my ~/.bashrc
###################### CARDINAL ENVIRONMENT ######################
module purge
module load use.moose
module load moose-tools
module load openmpi/4.1.6-gcc-12.3.0-panw
module load cmake/3.27.7-gcc-12.3.0-5cfk
module load gcc/12.3.0-gcc-10.5.0-vx2f # needed for NekRS
Might just be better to load them in the terminal when building Cardinal? I do get complaints that these modules don't exist when I scp
or log onto inlhpclogin
, but I just ignore the messages.
You can have if statements in your bashrc which can tell which system you're on. It's a similar setup at OLCF. Here's what we do for Frontier vs. Summit, maybe a similar syntax will work on Bitteroot/Sawtooth.
if [ $LMOD_SYSTEM_NAME = frontier ]; then
module purge
module load PrgEnv-gnu craype-accel-amd-gfx90a cray-mpich rocm cray-python/3.9.13.1 cmake/3.21.3
module unload cray-libsci
# Revise for your Cardinal repository location
DIRECTORY_WHERE_YOU_HAVE_CARDINAL=$HOME/frontier
cd $DIRECTORY_WHERE_YOU_HAVE_CARDINAL
HOME_DIRECTORY_SYM_LINK=$(realpath -P $DIRECTORY_WHERE_YOU_HAVE_CARDINAL)
export NEKRS_HOME=$HOME_DIRECTORY_SYM_LINK/cardinal/install
export OPENMC_CROSS_SECTIONS=/lustre/orion/fus166/proj-shared/novak/cross_sections/endfb-vii.1-hdf5/cross_sections.xml
fi
Thanks to @loganharbour this submit script using module load cardinal-mpich
worked for me on bitterroot. It's running a pretty hefty job quickly. Maybe his apptainer knowledge could be useful for more detailed build from source info
#!/bin/sh
#This file is called submit-script.sh
#SBATCH --partition=general # default general (option short or hbm)
#SBATCH --time=7-00:00:00 # run time in days-hh:mm:ss (6 hours is the max for short)
#SBATCH --nodes=32 # number of job nodes (max is 168 nodes on general, 336 nodes on short)
#SBATCH --ntasks-per-node=1 # mpi ranks per node
#SBATCH --cpus-per-task=112 # threads per mpi rank
#SBATCH --wckey=moose # project code
#SBATCH --error=small_inf_assembly.err.%J
#SBATCH --output=small_inf_assembly.txt.%J
module purge
module load use.moose moose-containers cardinal-mpich
JOB_DIR=/home/groslewi/gcmr/mwes/25kp_dt1e-2_small_inf_assembly
export MV2_USE_ALIGNED_ALLOC=1
export MV2_THREADS_PER_PROCESS=${SLURM_CPUS_PER_TASK}
mpiexec cardinal-opt -i ${JOB_DIR}/openmc.i --n-threads=${SLURM_CPUS_PER_TASK}
Does cardinal-mpich
include NekRS in it?
Does
cardinal-mpich
include NekRS in it?
It does. It's the base for what's being used for docker: openmc, dagmc, nekrs
@AyaHegazy22 and I chatted a bit and she was unable to recreate the success. This makes sense though, as we discovered that I was also only able to run the job on some select nodes. Thanks to Logan, it should work on every node now.
I just launched a job that is running. Aya, if you get a chance try again. Here's my working submit script. (has a few better defaults for #SBATCH)
#!/bin/sh
#This file is called submit-script.sh
#SBATCH --partition=general # default general (option short or hbm)
#SBATCH --time=0-06:00:00 # run time in days-hh:mm:ss (6 hours is the max for short)
#SBATCH --nodes=24 # number of job nodes (max is 168 nodes on general, 336 nodes on short)
#SBATCH --ntasks-per-node=1 # mpi ranks per node
#SBATCH --cpus-per-task=112 # threads per mpi rank
#SBATCH --wckey=moose # project code
#SBATCH --error=small_inf_assembly.err.%J
#SBATCH --output=small_inf_assembly.txt.%J
module purge
module load use.moose moose-containers cardinal-mpich/2024.07.12-b44370a
JOB_DIR=/home/groslewi/gcmr/mwes/small_inf_assembly
export MV2_USE_ALIGNED_ALLOC=1
export MV2_THREADS_PER_PROCESS=${SLURM_CPUS_PER_TASK}
mpiexec cardinal-opt -i ${JOB_DIR}/openmc.i --n-threads=${SLURM_CPUS_PER_TASK}
@lewisgross1296 is there any update on this? with the above it looks like you were still using the pre-built Cardinal only, right?
I have not tried to build from source on Bitterroot, since it seems that the suggested way is to use the Apptainer provided. The container has worked pretty well so far tho.
I have yet to try a Nek case, so can't confirm behavior there.If @loganharbour is able to share the Apptainer build script, that might be useful for others trying to build from source.
Reason
New machine coming to INL, let's make sure we know how to build Cardinal on it.
Design
Add Bitterroot as a system to Cardinal's HPC documents.
Impact
Better user experience.