ufs-community / ufs-weather-model

UFS Weather Model
Other
130 stars 238 forks source link

Spack-stack 1.5.1, ESMF 8.5.0, FMS 2023.02.01 + Remove Gaea C4 + Fix build system to allow CMAKE_<COMPILER>_FLAGS to be specified for submodules #2052 #2013

Closed FernandoAndrade-NOAA closed 5 months ago

FernandoAndrade-NOAA commented 6 months ago

PR Author Checklist:

Description

This PR updates modulefiles to spack-stack 1.5.1. Updates have been made to the esmf, fms, mapl, and gftl-shared versions within ufs_common.lua. This also includes a fix for Gaea's timeout with cpld_bmark_p8 noted in #1978.

Commit Message

Linked Issues and Pull Requests

Associated UFSWM Issue to close

Subcomponent Pull Requests

Blocking Dependencies

Subcomponents involved:

Anticipated Changes

Input data

Regression Tests:

Intel: regional_noquilt control_csawmg control_csawmgt control_ras control_CubedSphereGrid_debug control_wrtGauss_netcdf_parallel_debug control_stochy_debug control_landp_debug control_csawmg_debug control_csawmgt_debug control_ras_debug control_diag_debug rap_control_debug hrrr_control_debug hrrr_gf_debug hrrr_c3_debug rap_unified_drag_suite_debug rap_diag_debug rap_cires_ugwp_debug rap_unified_ugwp_debug rap_lndp_debug rap_progcld_thompson_debug rap_noah_debug rap_sfcdiff_debug rap_noah_sfcdiff_cires_ugwp_debug rrfs_v1beta_debug rap_clm_lake_debug rap_flake_debug gnv1_c96_no_nest_debug control_wam_debug rap_control_dyn64_phy32

Hercules/gnu: control_c48_gnu control_stochy_gnu control_ras_gnu control_flake_gnu rap_control_gnu rap_decomp_gnu rap_2threads_gnu rap_sfcdiff_gnu rap_sfcdiff_decomp_gnu hrrr_control_gnu hrrr_control_noqr_gnu hrrr_control_2threads_gnu hrrr_control_decomp_gnu rrfs_v1beta_gnu rap_control_dyn32_phy32_gnu hrrr_control_dyn32_phy32_gnu rap_2threads_dyn32_phy32_gnu hrrr_control_2threads_dyn32_phy32_gnu hrrr_control_decomp_dyn32_phy32_gnu conus13km_control_gnu rap_control_dyn64_phy32_gnu

cpld_control_p8_gnu cpld_control_nowave_noaero_p8_gnu cpld_debug_p8_gnu cpld_control_pdlib_p8_gnu cpld_debug_pdlib_p8_gnu datm_cdeps_control_cfsr_gnu

Libraries

Code Managers Log - [x] This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. - [ ] Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems. - [ ] N/A ### Testing Log: - RDHPCS - [x] Hera - [x] Orion - [x] Hercules - [x] Jet - [x] Gaea - [x] Cheyenne - WCOSS2 - [x] Dogwood/Cactus - [ ] Acorn - CI - [x] Completed - opnReqTest - [ ] N/A - [x] Log attached to comment
FernandoAndrade-NOAA commented 6 months ago

Preliminary test with cpld_control_p8 intel/gnu passed with no changes. Running full RTs on Hera. Jet will be added once maintenance finishes to confirm 1.5.1 path.

FernandoAndrade-NOAA commented 6 months ago

It seems Gaea and hercules/gnu tests failed due to esmf 8.5.0 being unavailable. @ulmononian @climbfuji @natalie-perlin FYI. There were cmake and nccmp version conflicts as well.

junwang-noaa commented 6 months ago

@FernandoAndrade-NOAA May I ask the EPIC team to install fms/2023.02.01 in this spack-stack 1.5.1 package, instead of fms/2023.03? The fms/2023.03 does not have the diag_table bug fix that is in fms/2023.02.01. The GFSv17 requires that bug fix for their application with IAU. We may have to turn off the failed gnu tests on Derocho as specified in https://github.com/JCSDA/spack-stack/issues/860. @jkbk2004 @climbfuji @laurenchilutti FYI.

jkbk2004 commented 6 months ago

@FernandoAndrade-NOAA May I ask the EPIC team to install fms/2023.02.01 in this spack-stack 1.5.1 package, instead of fms/2023.03? The fms/2023.03 does not have the diag_table bug fix that is in fms/2023.02.01. The GFSv17 requires that bug fix for their application with IAU. We may have to turn off the failed gnu tests on Derocho as specified in JCSDA/spack-stack#860. @jkbk2004 @climbfuji @laurenchilutti FYI.

@RatkoVasic-NOAA @ulmononian FYI: need to move to fms-2023.02-01

RatkoVasic-NOAA commented 6 months ago

Yes, these are available under 1.5.1: fms/2023.01 fms/2023.02.01 fms/2023.03

jkbk2004 commented 6 months ago

@AlexanderRichert-NOAA @Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ?

Hang-Lei-NOAA commented 6 months ago

@Jong Kim - NOAA Affiliate @.***> These libs had been del;ivered to GDIT for installation. Some have been available on wcoss2.

On Wed, Dec 6, 2023 at 10:14 AM JONG KIM @.***> wrote:

@AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ?

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843088295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFA3ILJ4UDJBUR2ZKHLYICDT7AVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGA4DQMRZGU . You are receiving this because you were mentioned.Message ID: @.***>

jkbk2004 commented 6 months ago

@jong Kim - NOAA Affiliate @.> These libs had been del;ivered to GDIT for installation. Some have been available on wcoss2. On Wed, Dec 6, 2023 at 10:14 AM JONG KIM @.> wrote: @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ? — Reply to this email directly, view it on GitHub <#2013 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFA3ILJ4UDJBUR2ZKHLYICDT7AVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGA4DQMRZGU . You are receiving this because you were mentioned.Message ID: @.***>

@Hang-Lei-NOAA If some of them available, we may update the module path in this pr. Can you check and point to installation path? @BrianCurtis-NOAA FYI

Hang-Lei-NOAA commented 6 months ago

Upon @Brian Curtis - NOAA Affiliate @.***> 's conversation with me yesterday, I checked with GDIT. fms/2023.02.01 has been on wcoss2 for weeks. For ESMF-B/8.5.0 and associated mapl etc. GDIT response is "The RFC is scheduled for 12pmET on Wednesday for Cactus and Thursday for Dogwood."

On Wed, Dec 6, 2023 at 10:23 AM JONG KIM @.***> wrote:

@jong https://github.com/jong Kim - NOAA Affiliate @.

> These libs had been del;ivered to GDIT for installation. Some have been available on wcoss2. … <#m_-8727115647805167377_m4884569230482624048> On Wed, Dec 6, 2023 at 10:14 AM JONG KIM @.> wrote: @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ? — Reply to this email directly, view it on GitHub <#2013 (comment) https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843088295>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFA3ILJ4UDJBUR2ZKHLYICDT7AVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGA4DQMRZGU . You are receiving this because you were mentioned.Message ID: @.***>

If some of them available, we may update the module path in this pr. Can you check and point to installation path? @BrianCurtis-NOAA https://github.com/BrianCurtis-NOAA FYI

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843105318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFCNNCGL46I6DIGJFN3YICEXNAVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGEYDKMZRHA . You are receiving this because you were mentioned.Message ID: @.***>

FernandoAndrade-NOAA commented 6 months ago

Just leaving a note, preliminary testing with control_p8 intel / gnu across Hera, Gaea C5, Jet, Orion, and Hercules succeeded.

Hang-Lei-NOAA commented 6 months ago

@Brian Curtis - NOAA Affiliate @.***> The libraries have been available on wcoss2: ---- WCOSS2 Intel Compiled MPI Libraries and Tools

cdo/1.9.8 (D) esmf/8.1.0 fms/2022.04 (D) hdf5/1.12.2 ncio/1.0.0 netcdf-A/4.9.2 pio-B/2.5.10 scotch/7.0.4 wgrib2/2.0.8_mpi esmf-A/8.4.2 esmf/8.1.1 (D) fms/2023.02.01 mapl-A/2.35.2-esmf-8.4.2 ncio/1.1.2 (D) netcdf-B/4.9.2 pio/2.5.3 (D) upp/8.2.0 wrf_io/1.1.1 esmf-B/8.5.0 esmf/8.4.1 hdf5-A/1.14.0 mapl-B/2.40.3 nemsio/2.5.2 netcdf/4.7.4 (D) pio/2.5.10 upp/8.3.0 wrf_io/1.2.0 (D) esmf/7.1.0r fms-A/2023.01 hdf5-B/1.14.0 ncdiag/1.0.0 nemsio/2.5.4 (D) netcdf/4.9.0 pnetcdf/1.12.2 upp/10.0.8 (D) esmf/8.0.1 fms/2022.03 hdf5/1.10.6 (D) ncdiag/1.1.1 (D) nemsiogfs/2.5.3 pio-A/2.5.10 schism/5.11.0 w3emc/2.7.3

gftl-shared/1.6.1

On Wed, Dec 6, 2023 at 11:17 AM Fernando Andrade - NOAA < @.***> wrote:

Just leaving a note, preliminary with control_p8 intel / gnu across Hera, Gaea C5, Jet, Orion, and Hercules succeeded.

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843225405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFHSX7BS3EOM4FE2O73YICLCRAVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGIZDKNBQGU . You are receiving this because you were mentioned.Message ID: @.***>

BrianCurtis-NOAA commented 6 months ago

OK Thanks. I have no idea why they are adding a letter to the library names

junwang-noaa commented 6 months ago

@FernandoAndrade-NOAA @BrianCurtis-NOAA May I ask if there is any issue with this PR? Is it ready for commit? This feature is requested for several projects. Thanks

BrianCurtis-NOAA commented 6 months ago

@FernandoAndrade-NOAA @BrianCurtis-NOAA May I ask if there is any issue with this PR? Is it ready for commit? This feature is requested for several projects. Thanks

@junwang-noaa Let me run the full suite on WCOSS2 after I finish the testing for Denise's PR 2010. I have to double check I have all of th "-B" packages setup in the modulefile for WCOSS2. Last time I tried it failed but I think I was missing some of the "-B" packages

FernandoAndrade-NOAA commented 6 months ago

@FernandoAndrade-NOAA @BrianCurtis-NOAA May I ask if there is any issue with this PR? Is it ready for commit? This feature is requested for several projects. Thanks

The last set of changes from my side consist of removing Gaea C4 now that C5 is supported. I will also need to rerun Hera RTs to double check the changed results with the switch of FMS from 2023.03 back to 2023.02.01.

BrianCurtis-NOAA commented 6 months ago
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1340): error #6404: This name does not have a type, and must have an explicit type.   [UPPERCASE]
          select case( uppercase(trim(valueS)) )
-----------------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1340): error #6608: In a CASE statement, the case-expr must be of type INTEGER, CHARACTER, or LOGICAL.   [UPPERCASE]
          select case( uppercase(trim(valueS)) )
-----------------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1341): error #6611: The case-value must be of the same type as the case-expr.   ['JULIAN']
          case( 'JULIAN' )
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1343): error #6611: The case-value must be of the same type as the case-expr.   ['GREGORIAN']
          case( 'GREGORIAN' ) 
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1345): error #6611:
 The case-value must be of the same type as the case-expr.   ['NOLEAP']
          case( 'NOLEAP' )    
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1347): error #6611: The case-value must be of the same type as the case-expr.   ['THIRTY_DAY']
          case( 'THIRTY_DAY' )
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1349): error #6611: The case-value must be of the same type as the case-expr.   ['NO_CALENDAR']
          case( 'NO_CALENDAR' )
----------------^
AlexanderRichert-NOAA commented 6 months ago

FWIW I ran into that issue with uppercase being missing, it comes from mpp_mod (use mpp_mod, only : uppercase)

junwang-noaa commented 6 months ago

@DusanJovic-NOAA I thought you have the fixes for mpp_mod variables in one of your FV3 PRs?

BrianCurtis-NOAA commented 6 months ago

FWIW I ran into that issue with uppercase being missing, it comes from mpp_mod (use mpp_mod, only : uppercase)

It's here: https://github.com/NOAA-EMC/fv3atm/blob/a82381c0b751a15e5343de5078ef836b2c444c89/io/module_wrt_grid_comp.F90#L32

BrianCurtis-NOAA commented 6 months ago

Could it have something to do with FMS 2023.02.01 ?

brian.curtis@dlogin03:/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model> module list

Currently Loaded Modules:
  1) craype-x86-rome     (H)   7) craype/2.7.13      13) hdf5-B/1.14.0   19) crtm/2.4.0     25) gftl-shared/1.6.1
  2) libfabric/1.11.0.0. (H)   8) cray-mpich/8.1.12  14) netcdf-B/4.9.2  20) g2/3.4.5       26) mapl-B/2.40.3
  3) craype-network-ofi  (H)   9) cmake/3.20.2       15) pio-B/2.5.10    21) g2tmpl/1.10.2  27) scotch/7.0.4
  4) envvar/1.0               10) jasper/2.0.25      16) esmf-B/8.5.0    22) ip/3.3.3       28) ufs_wcoss2.intel
  5) PrgEnv-intel/8.1.0       11) zlib/1.2.11        17) fms/2023.02.01  23) sp/2.3.3
  6) intel/19.1.3.304         12) libpng/1.6.37      18) bacio/2.4.1     24) w3emc/2.9.2
DusanJovic-NOAA commented 6 months ago

@DusanJovic-NOAA I thought you have the fixes for mpp_mod variables in one of your FV3 PRs?

I have, in this commit:

https://github.com/NOAA-EMC/fv3atm/pull/706/commits/93979e5f74150b37ce2956ee11a5d91b7da261bb

You asked me to add it. But this is different function. Maybe we should check with GFDL about this.

BrianCurtis-NOAA commented 6 months ago

@DusanJovic-NOAA @junwang-noaa @AlexanderRichert-NOAA I see it's not using mpp_mod for uppercase in module_wrt_grid_comp.F90. Should we make the change in that file and add an FV3 PR to this? Or still talk to GFDL?

AlexanderRichert-NOAA commented 6 months ago

I got fv3atm to compile by adding that, but it might not hurt to get clarification from them, especially if you're interested in accommodating multiple versions of fms in fv3atm.

junwang-noaa commented 6 months ago

@BrianCurtis-NOAA you can add that function and made an fv3 PR, I thought we already added it. Sorry for missing it in Dusan's PR.

DusanJovic-NOAA commented 6 months ago

In the fms module (the one we 'use' by 'use fms') I see they are including transitively all (many) other fms (the library) modules and renaming the functions with the fms_ prefix, basically namespacing them. So maybe we should start using those functions instead of individually 'using' different modules and listing explicitly ever single function we use.

In this case instead of calling 'uppercase' we call 'fms_mpp_uppercase'.

BrianCurtis-NOAA commented 6 months ago

In the fms module (the one we 'use' by 'use fms') I see the are including transitively all (many) other fms (the library) modules and renaming the functions with the fms_ prefix, basically namespacing them. So maybe we should start using those functions instead of individually 'using' different modules and listing explicitly ever single function we use.

In this case instead of calling 'uppercase' we call 'fms_mpp_uppercase'.

use mpp_mod, only : fms_mpp_init, fms_mpp_uppercase ?

DusanJovic-NOAA commented 6 months ago

Instead of:

      select case( uppercase(trim(valueS)) )

try:

      select case( fms_mpp_uppercase(trim(valueS)) )
BrianCurtis-NOAA commented 6 months ago

@DusanJovic-NOAA Would I also need to call fms_mpp_error instead of mpp_error ?

DusanJovic-NOAA commented 6 months ago

@DusanJovic-NOAA Would I also need to call fms_mpp_error instead of mpp_error ?

You don't have to, since it's already listed in use statement, but if we want to be consistent then we should:

This is where it's defined in fms module:

https://github.com/NOAA-GFDL/FMS/blob/51af7e155fcf89008278281cf5ae3279e598580e/libFMS.F90#L489

we should ask GFDL and see what their recommendation is, to use fms_* functions from fms module or use other modules like mpp_mod etc.

junwang-noaa commented 6 months ago

@bensonr May I ask what's your suggestion on using fms functions?

BrianCurtis-NOAA commented 6 months ago

With these changes:

diff --git a/io/module_wrt_grid_comp.F90 b/io/module_wrt_grid_comp.F90
index b59fe5e..e5e32ab 100644
--- a/io/module_wrt_grid_comp.F90
+++ b/io/module_wrt_grid_comp.F90
@@ -1337,7 +1337,7 @@

 ! save calendar_type (as integer) for use in 'coupler.res'
         if (index(trim(attNameList(i)),'time:calendar') > 0) then
-          select case( uppercase(trim(valueS)) )
+          select case( fms_mpp_uppercase(trim(valueS)) )
           case( 'JULIAN' )
               calendar_type = JULIAN
           case( 'GREGORIAN' )
@@ -1349,7 +1349,7 @@
           case( 'NO_CALENDAR' )
               calendar_type = NO_CALENDAR
           case default
-              call mpp_error ( FATAL, 'fcst_initialize: calendar must be one of '// &
+              call fms_mpp_error ( FATAL, 'fcst_initialize: calendar must be one of '// &
                                       'JULIAN|GREGORIAN|NOLEAP|THIRTY_DAY|NO_CALENDAR.' )
           end select
         endif

compiles are successful and about 50 tests fail to compare:

brian.curtis@dlogin03:/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/tests/logs/log_wcoss2> grep -
r  FAIL rt_*                                                                                                                       
rt_001_cpld_control_p8_mixedmode_intel.log:Test 001 cpld_control_p8_mixedmode_intel FAIL Tries: 2
rt_002_cpld_control_gfsv17_intel.log:Test 002 cpld_control_gfsv17_intel FAIL Tries: 2
rt_005_cpld_mpi_gfsv17_intel.log:Test 005 cpld_mpi_gfsv17_intel FAIL Tries: 2
rt_006_cpld_debug_gfsv17_intel.log:Test 006 cpld_debug_gfsv17_intel FAIL Tries: 2
rt_007_cpld_control_p8_intel.log:Test 007 cpld_control_p8_intel FAIL Tries: 2
rt_009_cpld_control_qr_p8_intel.log:Test 009 cpld_control_qr_p8_intel FAIL Tries: 2
rt_011_cpld_2threads_p8_intel.log:Test 011 cpld_2threads_p8_intel FAIL Tries: 2
rt_012_cpld_decomp_p8_intel.log:Test 012 cpld_decomp_p8_intel FAIL Tries: 2
rt_013_cpld_mpi_p8_intel.log:Test 013 cpld_mpi_p8_intel FAIL Tries: 2
rt_014_cpld_control_ciceC_p8_intel.log:Test 014 cpld_control_ciceC_p8_intel FAIL Tries: 2
rt_015_cpld_bmark_p8_intel.log:Test 015 cpld_bmark_p8_intel FAIL Tries: 2
rt_017_cpld_control_noaero_p8_intel.log:Test 017 cpld_control_noaero_p8_intel FAIL Tries: 2
rt_018_cpld_control_nowave_noaero_p8_intel.log:Test 018 cpld_control_nowave_noaero_p8_intel FAIL Tries: 2
rt_019_cpld_control_noaero_p8_agrid_intel.log:Test 019 cpld_control_noaero_p8_agrid_intel FAIL Tries: 2
rt_020_cpld_control_c48_intel.log:Test 020 cpld_control_c48_intel FAIL Tries: 2
rt_021_cpld_control_p8_faster_intel.log:Test 021 cpld_control_p8_faster_intel FAIL Tries: 2
rt_022_cpld_control_pdlib_p8_intel.log:Test 022 cpld_control_pdlib_p8_intel FAIL Tries: 2
rt_025_cpld_debug_pdlib_p8_intel.log:Test 025 cpld_debug_pdlib_p8_intel FAIL Tries: 2
rt_055_regional_noquilt_intel.log:Test 055 regional_noquilt_intel FAIL Tries: 2
rt_074_control_csawmg_intel.log:Test 074 control_csawmg_intel FAIL Tries: 2
rt_075_control_csawmgt_intel.log:Test 075 control_csawmgt_intel FAIL Tries: 2
rt_076_control_ras_intel.log:Test 076 control_ras_intel FAIL Tries: 2
rt_080_control_CubedSphereGrid_debug_intel.log:Test 080 control_CubedSphereGrid_debug_intel FAIL Tries: 2
rt_081_control_wrtGauss_netcdf_parallel_debug_intel.log:Test 081 control_wrtGauss_netcdf_parallel_debug_intel FAIL Tries: 2
rt_082_control_stochy_debug_intel.log:Test 082 control_stochy_debug_intel FAIL Tries: 2
rt_083_control_lndp_debug_intel.log:Test 083 control_lndp_debug_intel FAIL Tries: 2
rt_084_control_csawmg_debug_intel.log:Test 084 control_csawmg_debug_intel FAIL Tries: 2
rt_085_control_csawmgt_debug_intel.log:Test 085 control_csawmgt_debug_intel FAIL Tries: 2
rt_086_control_ras_debug_intel.log:Test 086 control_ras_debug_intel FAIL Tries: 2
rt_087_control_diag_debug_intel.log:Test 087 control_diag_debug_intel FAIL Tries: 2
rt_090_rap_control_debug_intel.log:Test 090 rap_control_debug_intel FAIL Tries: 2
rt_091_hrrr_control_debug_intel.log:Test 091 hrrr_control_debug_intel FAIL Tries: 2
rt_092_hrrr_gf_debug_intel.log:Test 092 hrrr_gf_debug_intel FAIL Tries: 2
rt_093_hrrr_c3_debug_intel.log:Test 093 hrrr_c3_debug_intel FAIL Tries: 2
rt_094_rap_unified_drag_suite_debug_intel.log:Test 094 rap_unified_drag_suite_debug_intel FAIL Tries: 2
rt_095_rap_diag_debug_intel.log:Test 095 rap_diag_debug_intel FAIL Tries: 2
rt_096_rap_cires_ugwp_debug_intel.log:Test 096 rap_cires_ugwp_debug_intel FAIL Tries: 2
rt_097_rap_unified_ugwp_debug_intel.log:Test 097 rap_unified_ugwp_debug_intel FAIL Tries: 2
rt_098_rap_lndp_debug_intel.log:Test 098 rap_lndp_debug_intel FAIL Tries: 2
rt_099_rap_progcld_thompson_debug_intel.log:Test 099 rap_progcld_thompson_debug_intel FAIL Tries: 2
rt_100_rap_noah_debug_intel.log:Test 100 rap_noah_debug_intel FAIL Tries: 2
rt_101_rap_sfcdiff_debug_intel.log:Test 101 rap_sfcdiff_debug_intel FAIL Tries: 2
rt_102_rap_noah_sfcdiff_cires_ugwp_debug_intel.log:Test 102 rap_noah_sfcdiff_cires_ugwp_debug_intel FAIL Tries: 2
rt_103_rrfs_v1beta_debug_intel.log:Test 103 rrfs_v1beta_debug_intel FAIL Tries: 2
rt_104_rap_clm_lake_debug_intel.log:Test 104 rap_clm_lake_debug_intel FAIL Tries: 2
rt_105_rap_flake_debug_intel.log:Test 105 rap_flake_debug_intel FAIL Tries: 2
rt_106_gnv1_c96_no_nest_debug_intel.log:Test 106 gnv1_c96_no_nest_debug_intel FAIL Tries: 2
rt_107_control_wam_debug_intel.log:Test 107 control_wam_debug_intel FAIL Tries: 2
rt_119_rap_control_dyn64_phy32_intel.log:Test 119 rap_control_dyn64_phy32_intel FAIL Tries: 2
bensonr commented 6 months ago

@junwang-noaa - the updated explicit naming schema available from use libFMS allows one to incorporate the FMS infrastructure library with a single module use statement in a manner similar to ESMF, MPI, HDF5, etc. We'd prefer people use the new schema as it clarifies what is FMS provided while also specifying the particular service or manager within the library. Understanding there is a lot of legacy code, one can still use the legacy method of including each individual fortran module as there is no plan at this time only provide access to the libFMS fortran module.

junwang-noaa commented 6 months ago

@bensonr Thanks for the clarification. We will use module use fms, and fms_function in the code above.

BrianCurtis-NOAA commented 6 months ago

@FernandoAndrade-NOAA Please add a dependency for: https://github.com/NOAA-EMC/fv3atm/pull/732

@DusanJovic-NOAA can you look that over and give it an approval before Fernando merges it into this PR?

jkbk2004 commented 6 months ago

@ulmononian @natalie-perlin @RatkoVasic-NOAA We will start committing #1836 today. After merging in #1836, we need to follow on for spack stack 1.5.1 installation derecho as well. We also need to revisit derecho job_card update issue. I will keep posting.

ulmononian commented 6 months ago

@ulmononian @natalie-perlin @RatkoVasic-NOAA We will start committing #1836 today. After merging in #1836, we need to follow on for spack stack 1.5.1 installation derecho as well. We also need to revisit derecho job_card update issue. I will keep posting.

@jkbk2004 spack-stack/1.5.1 is installed on derecho. @RatkoVasic-NOAA or @natalie-perlin, can you work w/ @FernandoAndrade-NOAA to ensure the derecho modulefile is updated to use 1.5.1 and the job_card is tested/tweaked (see #2033) for this PR?

RatkoVasic-NOAA commented 6 months ago

@ulmononian I already put derecho-gnu as 1.5.1 (I guess we didn't have 1.5.0 with gnu, only intel). All other machines in this pull request are pointing to 1.5.0 Do you want to up derecho only to 1.5.1? There is another PR for that #2013

ulmononian commented 6 months ago

@ulmononian I already put derecho-gnu as 1.5.1 (I guess we didn't have 1.5.0 with gnu, only intel). All other machines in this pull request are pointing to 1.5.0 Do you want to up derecho only to 1.5.1? There is another PR for that #2013

@RatkoVasic-NOAA yep -- once #1836 is merged, this PR (#2013) needs to have the derecho intel modulefile updated to 1.5.1. issue #2033 might just need some testing to optimize the fv3.exe run implementation.

FernandoAndrade-NOAA commented 6 months ago

Adding a note that Hercules / gnu failed around 30 tests as well, with 6 of those failing abnormally with segmentation faults @RatkoVasic-NOAA fyi:

Tests directory: /work/noaa/epic/nandoam/stmp/nandoam/FV3_RT/rt_1280700/

rt_228_cpld_control_p8_gnu.log:Test 228 cpld_control_p8_gnu FAIL
rt_229_cpld_control_nowave_noaero_p8_gnu.log:Test 229 cpld_control_nowave_noaero_p8_gnu FAIL
rt_230_cpld_debug_p8_gnu.log:Test 230 cpld_debug_p8_gnu FAIL
rt_231_cpld_control_pdlib_p8_gnu.log:Test 231 cpld_control_pdlib_p8_gnu FAIL
rt_232_cpld_debug_pdlib_p8_gnu.log:Test 232 cpld_debug_pdlib_p8_gnu FAIL
rt_233_datm_cdeps_control_cfsr_gnu.log:Test 233 datm_cdeps_control_cfsr_gnu FAIL
jkbk2004 commented 5 months ago

Adding a note that Hercules / gnu failed around 30 tests as well, with 6 of those failing abnormally with segmentation faults @RatkoVasic-NOAA fyi:

Tests directory: /work/noaa/epic/nandoam/stmp/nandoam/FV3_RT/rt_1280700/

rt_228_cpld_control_p8_gnu.log:Test 228 cpld_control_p8_gnu FAIL
rt_229_cpld_control_nowave_noaero_p8_gnu.log:Test 229 cpld_control_nowave_noaero_p8_gnu FAIL
rt_230_cpld_debug_p8_gnu.log:Test 230 cpld_debug_p8_gnu FAIL
rt_231_cpld_control_pdlib_p8_gnu.log:Test 231 cpld_control_pdlib_p8_gnu FAIL
rt_232_cpld_debug_pdlib_p8_gnu.log:Test 232 cpld_debug_pdlib_p8_gnu FAIL
rt_233_datm_cdeps_control_cfsr_gnu.log:Test 233 datm_cdeps_control_cfsr_gnu FAIL

hera/gnu runs ok but crash happens with mom io on hercules/gnu

159: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
160:    at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/config_src/infra/FMS2/MOM_io_infra.F90:905
161: #4  0x3360eb2 in __mom_io_infra_MOD_read_field_2d
169:    at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/src/framework/MOM_io.F90:2172
160: #6  0x30cd091 in __mom_shared_initialization_MOD_initialize_topography_from_file
160:    at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/src/initialization/MOM_shared_initialization.F90:175
160: #7  0x30b56b0 in __mom_fixed_initialization_MOD_mom_initialize_topography
160:    at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/src/initialization/MOM_fixed_initialization.F90:224
160: #8  0x30b5a12 in __mom_fixed_initialization_MOD_mom_initialize_fixed
jkbk2004 commented 5 months ago

@ulmononian @RatkoVasic-NOAA If issues are with gnu-12 on hercules, we may need to stay with gnu-11.3.1 on hercules. It might be worth to check with new versions of fms/esmf/gftl-shared in spack stack 1.5.0 on hercules. It will allow a sort of cross-check. Can you install them to 1.5.0?

climbfuji commented 5 months ago

@ulmononian @RatkoVasic-NOAA If issues are with gnu-12 on hercules, we may need to stay with gnu-11.3.1 on hercules. It might be worth to check with new versions of fms/esmf/gftl-shared in spack stack 1.5.0 on hercules. It will allow a sort of cross-check. Can you install them to 1.5.0?

Note that we had to update to gcc@12 due to a bug in mvapich2 with gcc@11

junwang-noaa commented 5 months ago

@jkbk2004 do we have any issue with gnu on hera/derecho? If not, can we create a separate issue to debug the gnu issue on Hercules?

DeniseWorthen commented 5 months ago

@FernandoAndrade-NOAA We're adding a Commit Message requirements to the PRs. Please add one in the space provided.

climbfuji commented 5 months ago

This doesn't help with the immediate problem you are having, but I wanted to let you know that I just built jedi-bundle on Hercules with GNU in spack-stack-1.5.1, and I was able to run all 2430 ctests without errors. These include tests with the fv3 dycore via fv3-jedi, and tests with mom6 via soca. So it seems that spack-stack-1.5.1 should generally be ok to use, and maybe the problem is a bug in the actual code getting used?

ulmononian commented 5 months ago

@ulmononian @RatkoVasic-NOAA If issues are with gnu-12 on hercules, we may need to stay with gnu-11.3.1 on hercules. It might be worth to check with new versions of fms/esmf/gftl-shared in spack stack 1.5.0 on hercules. It will allow a sort of cross-check. Can you install them to 1.5.0?

do you also need mapl@2.40.3 for this test env?

DeniseWorthen commented 5 months ago

Why is there no associated issue with this PR? The only one shown is extending wall-clock for one test.

junwang-noaa commented 5 months ago

Please include issue #1854, #1874,

zach1221 commented 5 months ago

Please include issue #1854, #1874,

These have been added to the linked issues section.

DeniseWorthen commented 5 months ago

@zach1221 They are not showing up on the right-sidebar, where the issues which get closed automagically appear. I think it maybe because of how you've linked the issues?