ufs-community / ufs-weather-model

UFS Weather Model
Other
142 stars 249 forks source link

UFS-WM cpld_debug_p8 and cpld_control_p8 gnu test case hangs on hera #2263

Open jkbk2004 opened 6 months ago

jkbk2004 commented 6 months ago

Description

To Reproduce:

Additional context

Failure message from error log for cpld_debug_p8 and cpld_control_p8 gnu.


The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release. Workarounds are to run on a single node, or to use a system with an RDMA capable network such as Infiniband.

Output

jkbk2004 commented 6 months ago

@uturuncoglu @RatkoVasic-NOAA This issue could be an issue with openmpi (especially old version of gnu) on hera. But worth to note that the issue became visible at the call ESMF_InfoBroadcast(info, rootPet=fcstPetList(1), rc=rc).

junwang-noaa commented 6 months ago

An ticket about this issue was created on ESMF support.

natalie-perlin commented 5 months ago

An update for Hera GNU:

Spack-stacks 1.5.1 and 1.6.0 with packages for ufs-weather-model and ufs-srweather-app have been built on Hera with GNU/13.3.0 compiler. Spack-stack v1.6.0 built with ESMF/8.6.1 and MAPL/2.46.0.

A first check of running the RTs: some pass, some RT fail

My WM tests with spack-stack-1.6.0 are in /scratch1/NCEPDEV/nems/Natalie.Perlin/ufs-weather-model

and with spack-stack-1.5.1 (run with -w option) are in /scratch1/NCEPDEV/nems/Natalie.Perlin/ufs-weather-model2/

A modulefile for using spack-stack-1.6.0: /scratch1/NCEPDEV/nems/Natalie.Perlin/_ufs-weather-model/modulefiles/ufshera.gnu.lua

help([[
loads UFS Model prerequisites for Hera/GNU
]])

prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13.3/envs/ufs-wm-srw-rocky8/install/modulefiles/Core")

stack_gnu_ver=os.getenv("stack_gnu_ver") or "13.3.0"
load(pathJoin("stack-gcc", stack_gnu_ver))

stack_openmpi_ver=os.getenv("stack_openmpi_ver") or "4.1.6"
load(pathJoin("stack-openmpi", stack_openmpi_ver))

cmake_ver=os.getenv("cmake_ver") or "3.23.1"
load(pathJoin("cmake", cmake_ver))

load("ufs_common")

nccmp_ver=os.getenv("nccmp_ver") or "1.9.0.1"
load(pathJoin("nccmp", nccmp_ver))

prepend_path("CPPFLAGS", " -I/apps/slurm_hera/23.11.3/include/slurm"," ")
prepend_path("LD_LIBRARY_PATH", "/apps/slurm_hera/23.11.3/lib")

setenv("CC", "mpicc")
setenv("CXX", "mpic++")
setenv("FC", "mpif90")
setenv("CMAKE_Platform", "hera.gnu")

whatis("Description: UFS build environment") 

The ufs_common.lua for use with spack-stack1.6.0:

whatis("Description: UFS build environment common libraries")

help([[Load UFS Model common libraries]])

local ufs_modules = {
  {["jasper"]          = "2.0.32"},
  {["zlib"]            = "1.2.13"},
  {["libpng"]          = "1.6.37"},
  {["hdf5"]            = "1.14.0"},
  {["netcdf-c"]        = "4.9.2"},
  {["netcdf-fortran"]  = "4.6.1"},
  {["parallelio"]      = "2.5.10"},
  {["esmf"]            = "8.6.1"},
  {["fms"]             = "2023.04"},
  {["bacio"]           = "2.4.1"},
  {["crtm"]            = "2.4.0.1"},
  {["g2"]              = "3.4.5"},
  {["g2tmpl"]          = "1.10.2"},
  {["ip"]              = "4.3.0"},
  {["sp"]              = "2.5.0"},
  {["w3emc"]           = "2.10.0"},
  {["gftl-shared"]     = "1.6.1"},
  {["mapl"]            = "2.46.0-esmf-8.6.1"},
  {["scotch"]          = "7.0.4"},
}

for i = 1, #ufs_modules do
  for name, default_version in pairs(ufs_modules[i]) do
    local env_version_name = string.gsub(name, "-", "_") .. "_ver"
    load(pathJoin(name, os.getenv(env_version_name) or default_version))
  end
end

A modulefile for using spack-stack-1.5.1: /scratch1/NCEPDEV/nems/Natalie.Perlin/_ufs-weather-model2/modulefiles/ufshera.gnu.lua

help([[
loads UFS Model prerequisites for Hera/GNU
]])

prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/envs/ufs-wm-srw-rocky8/install/modulefiles/Core")

stack_gnu_ver=os.getenv("stack_gnu_ver") or "13.3.0"
load(pathJoin("stack-gcc", stack_gnu_ver))

stack_openmpi_ver=os.getenv("stack_openmpi_ver") or "4.1.6"
load(pathJoin("stack-openmpi", stack_openmpi_ver))

cmake_ver=os.getenv("cmake_ver") or "3.23.1"
load(pathJoin("cmake", cmake_ver))

load("ufs_common")

nccmp_ver=os.getenv("nccmp_ver") or "1.9.0.1"
load(pathJoin("nccmp", nccmp_ver))

prepend_path("CPPFLAGS", " -I/apps/slurm_hera/23.11.3/include/slurm"," ")
prepend_path("LD_LIBRARY_PATH", "/apps/slurm_hera/23.11.3/lib")
setenv("CC", "mpicc")
setenv("CXX", "mpic++")
setenv("FC", "mpif90")
setenv("CMAKE_Platform", "hera.gnu")

whatis("Description: UFS build environment")
RatkoVasic-NOAA commented 5 months ago

I tested @natalie-perlin installation, and tests that were failing on Hera using GNU compiler now work. There are so many other tests to be done. @jkbk2004 I suggest weather-model group to test because some of tests are failing just because of not bit-identical results (which is expected).

natalie-perlin commented 5 months ago

All the regression tests with gnu/13.3.0 compiler and spack-stack/1.6.0 have successfully passed for the weather model, please see a full comment: https://github.com/ufs-community/ufs-weather-model/pull/2093#issuecomment-2143694396