Closed GeorgeGayno-NOAA closed 2 years ago
@GeorgeGayno-NOAA I encounter a problem when run "chgres_cube" on wcoss2/Acorn. I tried different settings, but the NPETS was always 1, could you give me some suggestions about how to run "chgres_cube" on wcoss2? One log file is located "/gpfs/dell2/ptmp/Xianwu.Xue/o/jgefs_atmos_prep_00.o39015" on Venus. However, the acorn is not available (it suddenly unconnected this afternoon). So I can not tell you where is the log file in wcoss2.
@GeorgeGayno-NOAA I encounter a problem when run "chgres_cube" on wcoss2/Acorn. I tried different settings, but the NPETS was always 1, could you give me some suggestions about how to run "chgres_cube" on wcoss2? One log file is located "/gpfs/dell2/ptmp/Xianwu.Xue/o/jgefs_atmos_prep_00.o39015" on Venus. However, the acorn is not available (it suddenly unconnected this afternoon). So I can not tell you where is the log file in wcoss2.
I have not run anything on WCOSS2 yet. My guess is you do not have the mpiexec command correct. Are you starting 36 instances of chgres_cube with one task each?
Yes, we use mpirun -n 36 on wcoss_dell_p35
@GeorgeGayno-NOAA I've been trying to get a modified version of the release/ops-hrefv3.1 chgres_cube code compiled on acorn, but struggling a bit. Hoping this href version can piggy back off of your centralized efforts at porting to WCOSS2. Thanks!
How to access the hpc-stack on Cactus from @KateFriedman-NOAA:
module load envvar/1.0
module load PrgEnv-intel/8.1.0
module load craype/2.7.8
module load intel/19.1.3.304
module load cray-mpich/8.1.7
@GeorgeGayno-NOAA So my understanding thus far of the stack on WCOSS2 is that those modules access the production installation of the stack but the NCEPLIBS group will also be installing hpc-stack as a dev version (what we know as hpc-stack). FYI, the production installation has some modules that are named slightly differently compared to what we use in hpc-stack now, I'm mainly referring to the hdf5 and netcdf modules accessed after loading the cray-mpich/8.1.7
module (e.g. hdf5-parallel/1.10.6
& netcdf-hdf5parallel/4.7.4
). I don't know if that naming difference will persist, I'm using the module names as they are currently set in global-workflow for now. I'll be following convos in the new #wcoss2-transition channel in Slack to see what happens with the stack installs moving forward.
@KateFriedman-NOAA I am able to compile on WCOSS2. And the gdas_init scripts are working. You are welcome to try it (d0c7784)
@GeorgeGayno-NOAA I've been trying to get a modified version of the release/ops-hrefv3.1 chgres_cube code compiled on acorn, but struggling a bit. Hoping this href version can piggy back off of your centralized efforts at porting to WCOSS2. Thanks!
@MatthewPyle-NOAA I will port your tag next.
@KateFriedman-NOAA I am able to compile on WCOSS2. And the gdas_init scripts are working. You are welcome to try it (d0c7784)
Awesome, I'll give it a try and report back, thanks @GeorgeGayno-NOAA! (Refs: https://github.com/NOAA-EMC/global-workflow/issues/399)
@GeorgeGayno-NOAA Should I now be using the build_all.sh
script to build UFS_UTILS from global-workflow? This script:
https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/wcoss2/build_all.sh
I was previously using the sorc/build_ufs_utils.sh
script. Thanks!
@GeorgeGayno-NOAA Should I now be using the
build_all.sh
script to build UFS_UTILS from global-workflow? This script:https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/wcoss2/build_all.sh
I was previously using the
sorc/build_ufs_utils.sh
script. Thanks!
Yes. Use build_all.sh
.
@GeorgeGayno-NOAA Thanks for confirming! I have built a copy of your feature/wcoss2 (d0c7784) on Cactus successfully. Please see the following build log and let me know if you see any issues, thanks!
/lfs/h2/emc/eib/noscrub/Kate.Friedman/git/feature-ops-wcoss2/sorc/logs/build_ufs_utils.log
@GeorgeGayno-NOAA Thanks for confirming! I have built a copy of your feature/wcoss2 (d0c7784) on Cactus successfully. Please see the following build log and let me know if you see any issues, thanks!
/lfs/h2/emc/eib/noscrub/Kate.Friedman/git/feature-ops-wcoss2/sorc/logs/build_ufs_utils.log
Looks OK.
Built the ops-hrefv3.1
tag at a6f2fd9 on WCOSS-Dell and ran the regression tests. Then built the hrefv3.1-wcoss2
branch at 77d4b13 on WCOSS2 (Cactus) and ran the regression tests. The output files were compared using the nccmp
utility. All differences were small (i.e., 'floating point' differences).
Almost ready to merge.
Checked out the branch at b75f502 on Cactus. All programs were successfully built using the build_all.sh
script. And the 'fixed' files were successfully linked using the ./fix/link_fixdirs.sh
script.
Next steps - run all consistency tests and utility scripts.
Next, all the utility scripts were run:
./util/vcoord_gen/run.wcoss2.sh
was successfully run to create a global_hyblev.txt
file. ./util/sfc_climo_gen/run.wcoss2.sh
was successfully run to create C384 surface climatological field files../util/gdas_init/driver.wcoss2.sh
was successfully run to create C192/C96 GDAS initial conditions using GFS v16 production data from 2022/06/26/06Z.Next run the consistency tests. Here, the baseline data was copied over from WCOSS-Dell.
The snow2mdl
and ice_blend
tests both passed.
All global_cycle
tests passed.
All chgres_cube
tests failed. However, differences with the baseline (from Dell) were small. The largest differences I could find were from the surface portion of the 13km_na_gfs_ncei_grib2
tests:
Variable Group Count Sum AbsSum Min Max Range Mean StdDev
slmsk / 24 0 48 -2 2 4 0 2.04302
tsea / 152618 0.005 5.381 -0.2465 0.2465 0.493 3.27615e-08 0.0028425
sheleg / 10632 5.82216e-13 8.62312e-10 -4.79794e-12 5.82645e-12 1.06244e-11 5.47607e-17 2.18734e-13
tg3 / 144618 -0.1 6.86 -0.29 0.29 0.58 -6.91477e-07 0.0036912
zorl / 114359 3.38271e-16 23.76 -0.99 0.99 1.98 2.95798e-21 0.0143419
alvsf / 59228 -3.7817e-16 4.63265e-13 -1.11022e-16 1.11022e-16 2.22045e-16 -6.38498e-21 1.06725e-17
alvwf / 59458 2.75821e-15 4.45182e-13 -1.11022e-16 1.11022e-16 2.22045e-16 4.63892e-20 1.01679e-17
alnsf / 59864 1.22402e-14 2.32706e-12 -1.11022e-16 1.11022e-16 2.22045e-16 2.04467e-19 4.12436e-17
alnwf / 59530 -1.47278e-15 2.20798e-12 -5.55112e-17 5.55112e-17 1.11022e-16 -2.47401e-20 3.94231e-17
vfrac / 58094 -9.95037e-15 5.37997e-12 -1.11022e-16 1.11022e-16 2.22045e-16 -1.71281e-19 9.76145e-17
t2m / 375999 4.62705e-11 2.27931e-08 -2.27374e-13 2.84217e-13 5.11591e-13 1.2306e-16 6.23245e-14
q2m / 584114 1.1215e-15 2.05325e-12 -1.05818e-16 1.35308e-16 2.41127e-16 1.92e-21 5.20012e-18
hice / 24 0 36 -1.5 1.5 3 0 1.53226
fice / 29226 -9.78662e-14 3.6 -0.15 0.15 0.3 -3.3486e-18 0.00429853
tisfc / 144609 0.6 3 -0.3 0.9 1.2 4.14912e-06 0.00297515
snwdph / 11587 -1.52036e-10 7.4197e-09 -4.7983e-11 2.94449e-11 7.74278e-11 -1.31212e-14 1.56991e-12
stc / 579081 -0.4 623.6 -6.5 6.5 13 -6.9075e-07 0.0836379
slc / 48 3.56382e-14 5.9619e-14 -1.4988e-15 2.17049e-14 2.32037e-14 7.42462e-16 3.25851e-15
But these differences are not large enough to be a concern.
Finally, the grid_gen
tests were run. The "C96 uniform" test passed. But the "C96 VIIRS", "GFDL REGIONAL", "ESG REGIONAL" and "REGIONAL GSL GWD" tests failed. However, differences were not concerning. Here are some differences from the "REGIONAL GSL GWD" test -
The C772_grid.tile7.halo5.nc file -
Variable Group Count Sum AbsSum Min Max Range Mean StdDev
x / 8849 -2.33058e-12 3.62093e-10 -5.68434e-14 5.68434e-14 1.13687e-13 -2.63372e-16 4.32842e-14
y / 89658 1.46194e-11 5.56664e-10 -2.13163e-14 2.13163e-14 4.26326e-14 1.63058e-16 6.62425e-15
area / 134969 -0.000457861 0.00165983 -5.21541e-08 4.47035e-08 9.68575e-08 -3.39234e-09 1.32818e-08
dx / 94459 -6.30007e-08 1.65441e-07 -6.36646e-12 5.45697e-12 1.18234e-11 -6.66963e-13 1.8036e-12
dy / 114121 -7.77163e-09 2.06951e-07 -7.27596e-12 7.27596e-12 1.45519e-11 -6.80999e-14 1.99692e-12
angle_dx / 184762 -7.38107e-13 3.21085e-10 -1.77636e-14 1.59872e-14 3.37508e-14 -3.99491e-18 2.26953e-15
angle_dy / 194789 6.41316e-13 3.69076e-10 -2.30926e-14 2.30926e-14 4.61853e-14 3.29236e-18 2.48833e-15
The C772_oro_data.tile7.halo5.nc file -
Variable Group Count Sum AbsSum Min Max Range Mean StdDev
land_frac / 1 6.17504e-05 6.17504e-05 6.17504e-05 6.17504e-05 0 6.17504e-05 0
orog_raw / 2 0.0717773 0.0923462 -0.0102844 0.0820618 0.0923462 0.0358887 0.0652986
orog_filt / 57 0.0717158 0.089478 -0.000732422 0.0223694 0.0231018 0.00125817 0.00405655
stddev / 2 0.0173798 0.0630608 -0.0228405 0.0402203 0.0630608 0.00868988 0.0445907
convexity / 2 -0.00313234 0.0213141 -0.0122232 0.0090909 0.0213141 -0.00156617 0.0150714
theta / 2 -0.394409 0.575539 -0.484974 0.0905647 0.575539 -0.197205 0.406967
gamma / 2 -0.00956333 0.00956333 -0.00613481 -0.00342852 0.00270629 -0.00478166 0.00191364
sigma / 2 5.99525e-05 5.99525e-05 2.18488e-05 3.81037e-05 1.62548e-05 2.99762e-05 1.14939e-05
elvmax / 2 -0.0717545 0.0923233 -0.0820389 0.0102844 0.0923233 -0.0358772 0.0652824
The C772.snowfree_albedo.tile7.halo5.nc file -
Variable Group Count Sum AbsSum Min Max Range Mean StdDev
visible_black_sky_albedo / 2 -1.11759e-08 1.11759e-08 -7.45058e-09 -3.72529e-09 3.72529e-09 -5.58794e-09 2.63418e-09
visible_white_sky_albedo / 1 -3.72529e-09 3.72529e-09 -3.72529e-09 -3.72529e-09 0 -3.72529e-09 0
near_IR_black_sky_albedo / 1 -1.49012e-08 1.49012e-08 -1.49012e-08 -1.49012e-08 0 -1.49012e-08 0
near_IR_white_sky_albedo / 2 -1.49012e-08 4.47035e-08 -2.98023e-08 1.49012e-08 4.47035e-08 -7.45058e-09 3.16101e-08
Note: the cpld_gridgen
program does not work on WCOSS2. Due to time constraints (WCOSS2 is now operational), this will be fixed later under #663.
Port to WCOSS2. This issue is for 'develop'.
Related issues:
552
580
582
Some guidance from NCO: https://docs.google.com/presentation/d/15v-7rEM2CkJlEzwX4sE_qJd8DFJCgI726pzghIM_AwE/edit?usp=sharing