ufs-community / ufs-weather-model

UFS Weather Model
Other
132 stars 239 forks source link

Module files and/or libraries missing on Dogwood #2325

Open DavidHuber-NOAA opened 3 weeks ago

DavidHuber-NOAA commented 3 weeks ago

Description

The directory /apps/test/hpc-stack/i-19.1.3.304__m-8.1.12__h-1.14.0__n-4.9.2__p-2.5.10__e-8.6.0_pnetcdf/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12 is missing on Dogwood, causing modules to not load correctly.

To Reproduce:

Clone the UFS and attempt to load modulefiles/ufs_wcoss2.intel.lua.

Output

david.huber@dlogin02:/lfs/h2/emc/nems/noscrub/david.huber/ufs-weather-model> module use modulefiles/
david.huber@dlogin02:/lfs/h2/emc/nems/noscrub/david.huber/ufs-weather-model> module load ufs_wcoss2.intel 
Lmod has detected the following error:  The following module(s) are unknown: "fms/2023.04" "esmf/8.6.0" "hdf5/1.14.0" "netcdf/4.9.2"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore-cache load "fms/2023.04" "esmf/8.6.0" "hdf5/1.14.0" "netcdf/4.9.2"

Also make sure that all modulefiles written in TCL start with the string #%Module

Executing this command requires loading "hdf5/1.14.0" which failed while processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_wcoss2.intel  modulefiles/ufs_wcoss2.intel.luaExecuting this command requires loading "netcdf/4.9.2" which failed while processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_wcoss2.intel  modulefiles/ufs_wcoss2.intel.luaExecuting this command requires loading "esmf/8.6.0" which failed while processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_wcoss2.intel  modulefiles/ufs_wcoss2.intel.luaExecuting this command requires loading "fms/2023.04" which failed while processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_wcoss2.intel  modulefiles/ufs_wcoss2.intel.lua
DavidHuber-NOAA commented 3 weeks ago

I have opened WCOSS2 issue number 2024061310000053 with NCO to look into this as well.

RussTreadon-NOAA commented 2 weeks ago

Thank you @DavidHuber-NOAA for opening this issue and contacting NCO. The inability to build the ufs weather model on Dogwood is a serious problem. We can't run g-w CI.

DavidHuber-NOAA commented 2 weeks ago

Bongi is synching the installation from Cactus to Dogwood presently. It should be ready by noon (ET) today.

junwang-noaa commented 2 weeks ago

@DavidHuber-NOAA Brian and Hang are testing the module files now, @Hang-Lei-Noaa @BrianCurtis-NOAA would you please confirm that you are using the Bongi's new installation?

Hang-Lei-NOAA commented 2 weeks ago

@DavidHuber-NOAA You can use the lib-c series in the system: module load PrgEnv-intel module load craype module load intel module load cray-mpich module ava Then you will see the lib-c series on prod

DavidHuber-NOAA commented 2 weeks ago

Very good, thanks @Hang-Lei-NOAA. Should the UFS module files be updated to use these instead of the modules in

https://github.com/ufs-community/ufs-weather-model/blob/bba5449d27837a270386937fbae6d540abd50581/modulefiles/ufs_wcoss2.intel.lua#L21-L42

I'm happy to put in the PR for this, but just need to know which -C libraries need to be loaded. I'm guessing fms-C/2023.04, hdf5-C/1.14.0, mapl-C/2.40.3, netcdf-C/4.9.2, pio-C, and pnetcdf-C? And the default PrgEnv-intel, craype, intel, and cray-mpich?

BrianCurtis-NOAA commented 2 weeks ago

I'm running the c-libs right now on WCOSS2 for testing to double check all things pas regression tests. I'm almost finished. Once this is completed I can pass along the modulefile to use and i'll make sure to get it updated for the PR they are working on today.

DavidHuber-NOAA commented 2 weeks ago

Great, thanks @BrianCurtis-NOAA!

Hang-Lei-NOAA commented 2 weeks ago

@DavidHuber-NOAA Brian will test it and update. all these you mentioned plus the esmf-c should be loaded

BrianCurtis-NOAA commented 2 weeks ago

@DavidHuber-NOAA /lfs/h2/emc/nems/noscrub/brian.curtis/git/ufs-community/ufs-weather-model/modulefiles/ufs_wcoss2.intel.lua

DavidHuber-NOAA commented 2 weeks ago

Thanks Brian. Is line 21 still needed?

append_path("MODULEPATH", "/apps/test/hpc-stack/i-19.1.3.304__m-8.1.12__h-1.14.0__n-4.9.2__p-2.5.10__e-8.6.0_pnetcdf/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12")
BrianCurtis-NOAA commented 2 weeks ago

Thanks Brian. Is line 21 still needed?

append_path("MODULEPATH", "/apps/test/hpc-stack/i-19.1.3.304__m-8.1.12__h-1.14.0__n-4.9.2__p-2.5.10__e-8.6.0_pnetcdf/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12")

That is a good question. I think the only extra libs there are ones we are overwriting with -C. So we are getting lucky the compiler isn't confused. I think you are right though, it's not needed.

DavidHuber-NOAA commented 2 weeks ago

Just wanted to note that Bongi was able to rsync the hpc-stack builds and module files in /apps/test/hpc-stack/i-19.1.3.304__m-8.1.12__h-1.14.0__n-4.9.2__p-2.5.10__e-8.6.0_pnetcdf from Cactus to Dogwood. Global-workflow CI tests completed successfully yesterday on the system.