ufs-community / UFS_UTILS

Utilities for the NCEP models.
Other
21 stars 104 forks source link

Port UFS_UTILS to WCOSS2 #559

Closed GeorgeGayno-NOAA closed 2 years ago

GeorgeGayno-NOAA commented 3 years ago

Port to WCOSS2. This issue is for 'develop'.

Related issues:

Some guidance from NCO: https://docs.google.com/presentation/d/15v-7rEM2CkJlEzwX4sE_qJd8DFJCgI726pzghIM_AwE/edit?usp=sharing

XianwuXue-NOAA commented 3 years ago

@GeorgeGayno-NOAA I encounter a problem when run "chgres_cube" on wcoss2/Acorn. I tried different settings, but the NPETS was always 1, could you give me some suggestions about how to run "chgres_cube" on wcoss2? One log file is located "/gpfs/dell2/ptmp/Xianwu.Xue/o/jgefs_atmos_prep_00.o39015" on Venus. However, the acorn is not available (it suddenly unconnected this afternoon). So I can not tell you where is the log file in wcoss2.

GeorgeGayno-NOAA commented 3 years ago

@GeorgeGayno-NOAA I encounter a problem when run "chgres_cube" on wcoss2/Acorn. I tried different settings, but the NPETS was always 1, could you give me some suggestions about how to run "chgres_cube" on wcoss2? One log file is located "/gpfs/dell2/ptmp/Xianwu.Xue/o/jgefs_atmos_prep_00.o39015" on Venus. However, the acorn is not available (it suddenly unconnected this afternoon). So I can not tell you where is the log file in wcoss2.

I have not run anything on WCOSS2 yet. My guess is you do not have the mpiexec command correct. Are you starting 36 instances of chgres_cube with one task each?

XianwuXue-NOAA commented 3 years ago

Yes, we use mpirun -n 36 on wcoss_dell_p35

MatthewPyle-NOAA commented 3 years ago

@GeorgeGayno-NOAA I've been trying to get a modified version of the release/ops-hrefv3.1 chgres_cube code compiled on acorn, but struggling a bit. Hoping this href version can piggy back off of your centralized efforts at porting to WCOSS2. Thanks!

GeorgeGayno-NOAA commented 3 years ago

How to access the hpc-stack on Cactus from @KateFriedman-NOAA:

module load envvar/1.0
module load PrgEnv-intel/8.1.0
module load craype/2.7.8
module load intel/19.1.3.304
module load cray-mpich/8.1.7
KateFriedman-NOAA commented 3 years ago

@GeorgeGayno-NOAA So my understanding thus far of the stack on WCOSS2 is that those modules access the production installation of the stack but the NCEPLIBS group will also be installing hpc-stack as a dev version (what we know as hpc-stack). FYI, the production installation has some modules that are named slightly differently compared to what we use in hpc-stack now, I'm mainly referring to the hdf5 and netcdf modules accessed after loading the cray-mpich/8.1.7 module (e.g. hdf5-parallel/1.10.6 & netcdf-hdf5parallel/4.7.4). I don't know if that naming difference will persist, I'm using the module names as they are currently set in global-workflow for now. I'll be following convos in the new #wcoss2-transition channel in Slack to see what happens with the stack installs moving forward.

GeorgeGayno-NOAA commented 2 years ago

@KateFriedman-NOAA I am able to compile on WCOSS2. And the gdas_init scripts are working. You are welcome to try it (d0c7784)

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA I've been trying to get a modified version of the release/ops-hrefv3.1 chgres_cube code compiled on acorn, but struggling a bit. Hoping this href version can piggy back off of your centralized efforts at porting to WCOSS2. Thanks!

@MatthewPyle-NOAA I will port your tag next.

KateFriedman-NOAA commented 2 years ago

@KateFriedman-NOAA I am able to compile on WCOSS2. And the gdas_init scripts are working. You are welcome to try it (d0c7784)

Awesome, I'll give it a try and report back, thanks @GeorgeGayno-NOAA! (Refs: https://github.com/NOAA-EMC/global-workflow/issues/399)

KateFriedman-NOAA commented 2 years ago

@GeorgeGayno-NOAA Should I now be using the build_all.sh script to build UFS_UTILS from global-workflow? This script:

https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/wcoss2/build_all.sh

I was previously using the sorc/build_ufs_utils.sh script. Thanks!

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA Should I now be using the build_all.sh script to build UFS_UTILS from global-workflow? This script:

https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/wcoss2/build_all.sh

I was previously using the sorc/build_ufs_utils.sh script. Thanks!

Yes. Use build_all.sh.

KateFriedman-NOAA commented 2 years ago

@GeorgeGayno-NOAA Thanks for confirming! I have built a copy of your feature/wcoss2 (d0c7784) on Cactus successfully. Please see the following build log and let me know if you see any issues, thanks!

/lfs/h2/emc/eib/noscrub/Kate.Friedman/git/feature-ops-wcoss2/sorc/logs/build_ufs_utils.log

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA Thanks for confirming! I have built a copy of your feature/wcoss2 (d0c7784) on Cactus successfully. Please see the following build log and let me know if you see any issues, thanks!

/lfs/h2/emc/eib/noscrub/Kate.Friedman/git/feature-ops-wcoss2/sorc/logs/build_ufs_utils.log

Looks OK.

GeorgeGayno-NOAA commented 2 years ago

Built the ops-hrefv3.1 tag at a6f2fd9 on WCOSS-Dell and ran the regression tests. Then built the hrefv3.1-wcoss2 branch at 77d4b13 on WCOSS2 (Cactus) and ran the regression tests. The output files were compared using the nccmp utility. All differences were small (i.e., 'floating point' differences).

GeorgeGayno-NOAA commented 2 years ago

Almost ready to merge.

Checked out the branch at b75f502 on Cactus. All programs were successfully built using the build_all.sh script. And the 'fixed' files were successfully linked using the ./fix/link_fixdirs.sh script.

Next steps - run all consistency tests and utility scripts.

GeorgeGayno-NOAA commented 2 years ago

Next, all the utility scripts were run:

GeorgeGayno-NOAA commented 2 years ago

Next run the consistency tests. Here, the baseline data was copied over from WCOSS-Dell.

The snow2mdl and ice_blend tests both passed.

GeorgeGayno-NOAA commented 2 years ago

All global_cycle tests passed.

GeorgeGayno-NOAA commented 2 years ago

All chgres_cube tests failed. However, differences with the baseline (from Dell) were small. The largest differences I could find were from the surface portion of the 13km_na_gfs_ncei_grib2 tests:

Variable Group  Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
slmsk    /         24            0          48           -2           2           4            0     2.04302
tsea     /     152618        0.005       5.381      -0.2465      0.2465       0.493  3.27615e-08   0.0028425
sheleg   /      10632  5.82216e-13 8.62312e-10 -4.79794e-12 5.82645e-12 1.06244e-11  5.47607e-17 2.18734e-13
tg3      /     144618         -0.1        6.86        -0.29        0.29        0.58 -6.91477e-07   0.0036912
zorl     /     114359  3.38271e-16       23.76        -0.99        0.99        1.98  2.95798e-21   0.0143419
alvsf    /      59228  -3.7817e-16 4.63265e-13 -1.11022e-16 1.11022e-16 2.22045e-16 -6.38498e-21 1.06725e-17
alvwf    /      59458  2.75821e-15 4.45182e-13 -1.11022e-16 1.11022e-16 2.22045e-16  4.63892e-20 1.01679e-17
alnsf    /      59864  1.22402e-14 2.32706e-12 -1.11022e-16 1.11022e-16 2.22045e-16  2.04467e-19 4.12436e-17
alnwf    /      59530 -1.47278e-15 2.20798e-12 -5.55112e-17 5.55112e-17 1.11022e-16 -2.47401e-20 3.94231e-17
vfrac    /      58094 -9.95037e-15 5.37997e-12 -1.11022e-16 1.11022e-16 2.22045e-16 -1.71281e-19 9.76145e-17
t2m      /     375999  4.62705e-11 2.27931e-08 -2.27374e-13 2.84217e-13 5.11591e-13   1.2306e-16 6.23245e-14
q2m      /     584114   1.1215e-15 2.05325e-12 -1.05818e-16 1.35308e-16 2.41127e-16     1.92e-21 5.20012e-18
hice     /         24            0          36         -1.5         1.5           3            0     1.53226
fice     /      29226 -9.78662e-14         3.6        -0.15        0.15         0.3  -3.3486e-18  0.00429853
tisfc    /     144609          0.6           3         -0.3         0.9         1.2  4.14912e-06  0.00297515
snwdph   /      11587 -1.52036e-10  7.4197e-09  -4.7983e-11 2.94449e-11 7.74278e-11 -1.31212e-14 1.56991e-12
stc      /     579081         -0.4       623.6         -6.5         6.5          13  -6.9075e-07   0.0836379
slc      /         48  3.56382e-14  5.9619e-14  -1.4988e-15 2.17049e-14 2.32037e-14  7.42462e-16 3.25851e-15

But these differences are not large enough to be a concern.

GeorgeGayno-NOAA commented 2 years ago

Finally, the grid_gen tests were run. The "C96 uniform" test passed. But the "C96 VIIRS", "GFDL REGIONAL", "ESG REGIONAL" and "REGIONAL GSL GWD" tests failed. However, differences were not concerning. Here are some differences from the "REGIONAL GSL GWD" test -

The C772_grid.tile7.halo5.nc file -

Variable Group  Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
x        /       8849 -2.33058e-12 3.62093e-10 -5.68434e-14 5.68434e-14 1.13687e-13 -2.63372e-16 4.32842e-14
y        /      89658  1.46194e-11 5.56664e-10 -2.13163e-14 2.13163e-14 4.26326e-14  1.63058e-16 6.62425e-15
area     /     134969 -0.000457861  0.00165983 -5.21541e-08 4.47035e-08 9.68575e-08 -3.39234e-09 1.32818e-08
dx       /      94459 -6.30007e-08 1.65441e-07 -6.36646e-12 5.45697e-12 1.18234e-11 -6.66963e-13  1.8036e-12
dy       /     114121 -7.77163e-09 2.06951e-07 -7.27596e-12 7.27596e-12 1.45519e-11 -6.80999e-14 1.99692e-12
angle_dx /     184762 -7.38107e-13 3.21085e-10 -1.77636e-14 1.59872e-14 3.37508e-14 -3.99491e-18 2.26953e-15
angle_dy /     194789  6.41316e-13 3.69076e-10 -2.30926e-14 2.30926e-14 4.61853e-14  3.29236e-18 2.48833e-15

The C772_oro_data.tile7.halo5.nc file -

Variable  Group Count         Sum      AbsSum          Min         Max       Range        Mean      StdDev
land_frac /         1 6.17504e-05 6.17504e-05  6.17504e-05 6.17504e-05           0 6.17504e-05           0
orog_raw  /         2   0.0717773   0.0923462   -0.0102844   0.0820618   0.0923462   0.0358887   0.0652986
orog_filt /        57   0.0717158    0.089478 -0.000732422   0.0223694   0.0231018  0.00125817  0.00405655
stddev    /         2   0.0173798   0.0630608   -0.0228405   0.0402203   0.0630608  0.00868988   0.0445907
convexity /         2 -0.00313234   0.0213141   -0.0122232   0.0090909   0.0213141 -0.00156617   0.0150714
theta     /         2   -0.394409    0.575539    -0.484974   0.0905647    0.575539   -0.197205    0.406967
gamma     /         2 -0.00956333  0.00956333  -0.00613481 -0.00342852  0.00270629 -0.00478166  0.00191364
sigma     /         2 5.99525e-05 5.99525e-05  2.18488e-05 3.81037e-05 1.62548e-05 2.99762e-05 1.14939e-05
elvmax    /         2  -0.0717545   0.0923233   -0.0820389   0.0102844   0.0923233  -0.0358772   0.0652824

The C772.snowfree_albedo.tile7.halo5.nc file -

Variable                 Group Count          Sum      AbsSum          Min          Max       Range         Mean      StdDev
visible_black_sky_albedo /         2 -1.11759e-08 1.11759e-08 -7.45058e-09 -3.72529e-09 3.72529e-09 -5.58794e-09 2.63418e-09
visible_white_sky_albedo /         1 -3.72529e-09 3.72529e-09 -3.72529e-09 -3.72529e-09           0 -3.72529e-09           0
near_IR_black_sky_albedo /         1 -1.49012e-08 1.49012e-08 -1.49012e-08 -1.49012e-08           0 -1.49012e-08           0
near_IR_white_sky_albedo /         2 -1.49012e-08 4.47035e-08 -2.98023e-08  1.49012e-08 4.47035e-08 -7.45058e-09 3.16101e-08
GeorgeGayno-NOAA commented 2 years ago

Note: the cpld_gridgen program does not work on WCOSS2. Due to time constraints (WCOSS2 is now operational), this will be fixed later under #663.