Closed DeniseWorthen closed 3 years ago
So, when I submit a build job on Gaea I'm getting this error:
+ set +x
Lmod has detected the following error: The following module(s) are unknown:
"eproxy/2.0.24-7.0.2.1_2.20__g8e04b33.ari"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore-cache load "eproxy/2.0.24-7.0.2.1_2.20__g8e04b33.ari"
Also make sure that all modulefiles written in TCL start with the string
#%Module
which appears to be from this line in compile.sh: https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/compile.sh#L66:
source /lustre/f2/pdata/esrl/gsd/contrib/lua-5.1.4.9/init/init_lmod.sh
Anyone else ever gotten this error? I can't get to the error @DeniseWorthen mentioned because of this right now. I've tried from a fresh clone to make sure I didn't do anything and I don't have much in my .cshrc file.
The last person reporting this error was using tcsh
, I recommended switching to bash
and never heard back. Either it worked or that person gave up.
@climbfuji I am using tcsh, so that's at least consistent.
Let me see if I can get this to work (remind me tomorrow, please) ... I use bash and the (t)csh version is not as well tested, obviously.
@climbfuji I made a seperate issue https://github.com/ufs-community/ufs-weather-model/issues/536 so this issue can get back to being about the S2SW not building.
@DeniseWorthen I'll try from the command line again, but I might need you to help test until I can get the other sorted out.
Let me see if I can get this to work (remind me tomorrow, please) ... I use bash and the (t)csh version is not as well tested, obviously.
The trouble is that I cannot reproduce the problem, because the following works:
Dom.Heinzeller@gaea14:~> export | grep SHELL
declare -x SHELL="/bin/bash"
Dom.Heinzeller@gaea14:~> tcsh
Directory: /ncrc/home2/Dom.Heinzeller
home2/Dom.Heinzeller> env | grep SHELL
SHELL=/bin/bash
home2/Dom.Heinzeller> source /lustre/f2/pdata/esrl/gsd/contrib/lua-5.1.4.9/init/init_lmod.sh
Illegal variable name.
home2/Dom.Heinzeller> source /lustre/f2/pdata/esrl/gsd/contrib/lua-5.1.4.9/init/init_lmod.csh
Activating lua module environment
Reloading modules ... (sit back and relax)
home2/Dom.Heinzeller>
Note that the environment variable SHELL
still says bash
, even though I am in a tcsh
shell. Somehow it remembers aspects of my original bash
login shell.
@climbfuji I also can load on the login node:
> source /lustre/f2/pdata/esrl/gsd/contrib/lua-5.1.4.9/init/init_lmod.csh
Activating lua module environment
Reloading modules ... (sit back and relax)
but when you submit the job, the modules do not load. So it's hard. I'd be happy to do what I can to help test/reproduce the issues. I made another issue for this ( #536), should I close it?
I was able to build @JessicaMeixner-NOAA gaea_ww3 branch using:
source /lustre/f2/pdata/esrl/gsd/contrib/lua-5.1.4.9/init/init_lmod.sh
module use modulefiles/
module load ufs_gaea.intel
CMAKE_FLAGS="-DAPP=S2SW" CCPP_SUITES="FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled" BUILD_VERBOSE=1 BUILD_JOBS=1 ./build.sh > output 2>&1 &
@JessicaMeixner-NOAA I told init_lmod.sh
(and init_lmod.csh
) to ignore errors while loading modules. With that I could switch to tcsh
and submit a job card from the ufs-weather-model, which uses something like this:
#!/bin/bash -l
#SBATCH -e err.bash
#SBATCH -o out.bash
#SBATCH --job-name="init_lmod_bash_test"
#SBATCH --account=esrl_bmcs
#SBATCH --qos=normal
#SBATCH --clusters=c4
#SBATCH --ntasks=1
#SBATCH --time=5
set -eux
source ./module-setup.sh
source /lustre/f2/pdata/esrl/gsd/contrib/lua-5.1.4.9/init/init_lmod.sh
module use $( pwd -P )
module load modules.fv3
module list
echo "Model started: " `date`
sync && sleep 1
# here would be the call to srun ... fv3.exe
echo "Model ended: " `date`
Can you check if this works for you? It's not an ideal solution, because if something changes with the module environment that breaks the init_lmod
scripts we'll find out only when we compile the model / run the tests, but it's better than nothing (if it works). Thanks!
@climbfuji Where can I find the module-setup.sh file? I copied the
Can you copy it from here for now? It's some file under NEMS with a different name.
/lustre/f2/scratch/Dom.Heinzeller/FV3_RT/init_lmod_test
I think it worked. My directory is here: /lustre/f2/scratch/ncep/Jessica.Meixner/init_lmod_test I ran tryfix.sub
Yes, looks good. Can you try building the APP S2SW?
I submitted rt.sh -e and that still failed... I guess I'll have to switch from tcsh to bash?
I think Denise can now build with the fix I suggested, so we can hopefully at least have that fixed, even if I can't do it myself.
I guess I'll have to switch from tcsh to bash?
Good idea. Regardless of this issue.
Build on gaea was added in PR #533
Description
The app=s2sw fails to build on Gaea.
To Reproduce:
Checkout the develop branch of ufs-weather-model. Edit rt.conf to contain a single S2SW compile and save it as rt.test:
Be sure to remove the
gaea.intel
from the next to last column, otherwise rt.sh will just exit.Run the test:
./rt.sh -l rt.test >output 2>&1 &
Look in the RT-directory that the jobs is compiling in: compile_001/build_fv3_001/ww3_make.log: