Closed FernandoAndrade-NOAA closed 5 months ago
Preliminary test with cpld_control_p8 intel/gnu passed with no changes. Running full RTs on Hera. Jet will be added once maintenance finishes to confirm 1.5.1 path.
It seems Gaea and hercules/gnu tests failed due to esmf 8.5.0 being unavailable. @ulmononian @climbfuji @natalie-perlin FYI. There were cmake and nccmp version conflicts as well.
@FernandoAndrade-NOAA May I ask the EPIC team to install fms/2023.02.01 in this spack-stack 1.5.1 package, instead of fms/2023.03? The fms/2023.03 does not have the diag_table bug fix that is in fms/2023.02.01. The GFSv17 requires that bug fix for their application with IAU. We may have to turn off the failed gnu tests on Derocho as specified in https://github.com/JCSDA/spack-stack/issues/860. @jkbk2004 @climbfuji @laurenchilutti FYI.
@FernandoAndrade-NOAA May I ask the EPIC team to install fms/2023.02.01 in this spack-stack 1.5.1 package, instead of fms/2023.03? The fms/2023.03 does not have the diag_table bug fix that is in fms/2023.02.01. The GFSv17 requires that bug fix for their application with IAU. We may have to turn off the failed gnu tests on Derocho as specified in JCSDA/spack-stack#860. @jkbk2004 @climbfuji @laurenchilutti FYI.
@RatkoVasic-NOAA @ulmononian FYI: need to move to fms-2023.02-01
Yes, these are available under 1.5.1: fms/2023.01 fms/2023.02.01 fms/2023.03
@AlexanderRichert-NOAA @Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ?
@Jong Kim - NOAA Affiliate @.***> These libs had been del;ivered to GDIT for installation. Some have been available on wcoss2.
On Wed, Dec 6, 2023 at 10:14 AM JONG KIM @.***> wrote:
@AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ?
— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843088295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFA3ILJ4UDJBUR2ZKHLYICDT7AVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGA4DQMRZGU . You are receiving this because you were mentioned.Message ID: @.***>
@jong Kim - NOAA Affiliate @.> These libs had been del;ivered to GDIT for installation. Some have been available on wcoss2. … On Wed, Dec 6, 2023 at 10:14 AM JONG KIM @.> wrote: @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ? — Reply to this email directly, view it on GitHub <#2013 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFA3ILJ4UDJBUR2ZKHLYICDT7AVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGA4DQMRZGU . You are receiving this because you were mentioned.Message ID: @.***>
@Hang-Lei-NOAA If some of them available, we may update the module path in this pr. Can you check and point to installation path? @BrianCurtis-NOAA FYI
Upon @Brian Curtis - NOAA Affiliate @.***> 's conversation with me yesterday, I checked with GDIT. fms/2023.02.01 has been on wcoss2 for weeks. For ESMF-B/8.5.0 and associated mapl etc. GDIT response is "The RFC is scheduled for 12pmET on Wednesday for Cactus and Thursday for Dogwood."
On Wed, Dec 6, 2023 at 10:23 AM JONG KIM @.***> wrote:
@jong https://github.com/jong Kim - NOAA Affiliate @.
> These libs had been del;ivered to GDIT for installation. Some have been available on wcoss2. … <#m_-8727115647805167377_m4884569230482624048> On Wed, Dec 6, 2023 at 10:14 AM JONG KIM @.> wrote: @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you check status of these library updates on wcoss2/acorn: fms/esmf/mapl/gftl-shared ? — Reply to this email directly, view it on GitHub <#2013 (comment) https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843088295>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFA3ILJ4UDJBUR2ZKHLYICDT7AVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGA4DQMRZGU . You are receiving this because you were mentioned.Message ID: @.***>
If some of them available, we may update the module path in this pr. Can you check and point to installation path? @BrianCurtis-NOAA https://github.com/BrianCurtis-NOAA FYI
— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843105318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFCNNCGL46I6DIGJFN3YICEXNAVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGEYDKMZRHA . You are receiving this because you were mentioned.Message ID: @.***>
Just leaving a note, preliminary testing with control_p8 intel / gnu across Hera, Gaea C5, Jet, Orion, and Hercules succeeded.
cdo/1.9.8 (D) esmf/8.1.0 fms/2022.04 (D) hdf5/1.12.2 ncio/1.0.0 netcdf-A/4.9.2 pio-B/2.5.10 scotch/7.0.4 wgrib2/2.0.8_mpi esmf-A/8.4.2 esmf/8.1.1 (D) fms/2023.02.01 mapl-A/2.35.2-esmf-8.4.2 ncio/1.1.2 (D) netcdf-B/4.9.2 pio/2.5.3 (D) upp/8.2.0 wrf_io/1.1.1 esmf-B/8.5.0 esmf/8.4.1 hdf5-A/1.14.0 mapl-B/2.40.3 nemsio/2.5.2 netcdf/4.7.4 (D) pio/2.5.10 upp/8.3.0 wrf_io/1.2.0 (D) esmf/7.1.0r fms-A/2023.01 hdf5-B/1.14.0 ncdiag/1.0.0 nemsio/2.5.4 (D) netcdf/4.9.0 pnetcdf/1.12.2 upp/10.0.8 (D) esmf/8.0.1 fms/2022.03 hdf5/1.10.6 (D) ncdiag/1.1.1 (D) nemsiogfs/2.5.3 pio-A/2.5.10 schism/5.11.0 w3emc/2.7.3
gftl-shared/1.6.1
On Wed, Dec 6, 2023 at 11:17 AM Fernando Andrade - NOAA < @.***> wrote:
Just leaving a note, preliminary with control_p8 intel / gnu across Hera, Gaea C5, Jet, Orion, and Hercules succeeded.
— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2013#issuecomment-1843225405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFHSX7BS3EOM4FE2O73YICLCRAVCNFSM6AAAAAA76V2PL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGIZDKNBQGU . You are receiving this because you were mentioned.Message ID: @.***>
OK Thanks. I have no idea why they are adding a letter to the library names
@FernandoAndrade-NOAA @BrianCurtis-NOAA May I ask if there is any issue with this PR? Is it ready for commit? This feature is requested for several projects. Thanks
@FernandoAndrade-NOAA @BrianCurtis-NOAA May I ask if there is any issue with this PR? Is it ready for commit? This feature is requested for several projects. Thanks
@junwang-noaa Let me run the full suite on WCOSS2 after I finish the testing for Denise's PR 2010. I have to double check I have all of th "-B" packages setup in the modulefile for WCOSS2. Last time I tried it failed but I think I was missing some of the "-B" packages
@FernandoAndrade-NOAA @BrianCurtis-NOAA May I ask if there is any issue with this PR? Is it ready for commit? This feature is requested for several projects. Thanks
The last set of changes from my side consist of removing Gaea C4 now that C5 is supported. I will also need to rerun Hera RTs to double check the changed results with the switch of FMS from 2023.03 back to 2023.02.01.
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1340): error #6404: This name does not have a type, and must have an explicit type. [UPPERCASE]
select case( uppercase(trim(valueS)) )
-----------------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1340): error #6608: In a CASE statement, the case-expr must be of type INTEGER, CHARACTER, or LOGICAL. [UPPERCASE]
select case( uppercase(trim(valueS)) )
-----------------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1341): error #6611: The case-value must be of the same type as the case-expr. ['JULIAN']
case( 'JULIAN' )
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1343): error #6611: The case-value must be of the same type as the case-expr. ['GREGORIAN']
case( 'GREGORIAN' )
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1345): error #6611:
The case-value must be of the same type as the case-expr. ['NOLEAP']
case( 'NOLEAP' )
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1347): error #6611: The case-value must be of the same type as the case-expr. ['THIRTY_DAY']
case( 'THIRTY_DAY' )
----------------^
/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90(1349): error #6611: The case-value must be of the same type as the case-expr. ['NO_CALENDAR']
case( 'NO_CALENDAR' )
----------------^
FWIW I ran into that issue with uppercase
being missing, it comes from mpp_mod (use mpp_mod, only : uppercase
)
@DusanJovic-NOAA I thought you have the fixes for mpp_mod variables in one of your FV3 PRs?
FWIW I ran into that issue with
uppercase
being missing, it comes from mpp_mod (use mpp_mod, only : uppercase
)
Could it have something to do with FMS 2023.02.01 ?
brian.curtis@dlogin03:/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model> module list
Currently Loaded Modules:
1) craype-x86-rome (H) 7) craype/2.7.13 13) hdf5-B/1.14.0 19) crtm/2.4.0 25) gftl-shared/1.6.1
2) libfabric/1.11.0.0. (H) 8) cray-mpich/8.1.12 14) netcdf-B/4.9.2 20) g2/3.4.5 26) mapl-B/2.40.3
3) craype-network-ofi (H) 9) cmake/3.20.2 15) pio-B/2.5.10 21) g2tmpl/1.10.2 27) scotch/7.0.4
4) envvar/1.0 10) jasper/2.0.25 16) esmf-B/8.5.0 22) ip/3.3.3 28) ufs_wcoss2.intel
5) PrgEnv-intel/8.1.0 11) zlib/1.2.11 17) fms/2023.02.01 23) sp/2.3.3
6) intel/19.1.3.304 12) libpng/1.6.37 18) bacio/2.4.1 24) w3emc/2.9.2
@DusanJovic-NOAA I thought you have the fixes for mpp_mod variables in one of your FV3 PRs?
I have, in this commit:
https://github.com/NOAA-EMC/fv3atm/pull/706/commits/93979e5f74150b37ce2956ee11a5d91b7da261bb
You asked me to add it. But this is different function. Maybe we should check with GFDL about this.
@DusanJovic-NOAA @junwang-noaa @AlexanderRichert-NOAA I see it's not using mpp_mod for uppercase in module_wrt_grid_comp.F90. Should we make the change in that file and add an FV3 PR to this? Or still talk to GFDL?
I got fv3atm to compile by adding that, but it might not hurt to get clarification from them, especially if you're interested in accommodating multiple versions of fms in fv3atm.
@BrianCurtis-NOAA you can add that function and made an fv3 PR, I thought we already added it. Sorry for missing it in Dusan's PR.
In the fms module (the one we 'use' by 'use fms') I see they are including transitively all (many) other fms (the library) modules and renaming the functions with the fms_ prefix, basically namespacing them. So maybe we should start using those functions instead of individually 'using' different modules and listing explicitly ever single function we use.
In this case instead of calling 'uppercase' we call 'fms_mpp_uppercase'.
In the fms module (the one we 'use' by 'use fms') I see the are including transitively all (many) other fms (the library) modules and renaming the functions with the fms_ prefix, basically namespacing them. So maybe we should start using those functions instead of individually 'using' different modules and listing explicitly ever single function we use.
In this case instead of calling 'uppercase' we call 'fms_mpp_uppercase'.
use mpp_mod, only : fms_mpp_init, fms_mpp_uppercase
?
Instead of:
select case( uppercase(trim(valueS)) )
try:
select case( fms_mpp_uppercase(trim(valueS)) )
@DusanJovic-NOAA Would I also need to call fms_mpp_error instead of mpp_error ?
@DusanJovic-NOAA Would I also need to call fms_mpp_error instead of mpp_error ?
You don't have to, since it's already listed in use statement, but if we want to be consistent then we should:
This is where it's defined in fms module:
https://github.com/NOAA-GFDL/FMS/blob/51af7e155fcf89008278281cf5ae3279e598580e/libFMS.F90#L489
we should ask GFDL and see what their recommendation is, to use fms_* functions from fms module or use other modules like mpp_mod etc.
@bensonr May I ask what's your suggestion on using fms functions?
With these changes:
diff --git a/io/module_wrt_grid_comp.F90 b/io/module_wrt_grid_comp.F90
index b59fe5e..e5e32ab 100644
--- a/io/module_wrt_grid_comp.F90
+++ b/io/module_wrt_grid_comp.F90
@@ -1337,7 +1337,7 @@
! save calendar_type (as integer) for use in 'coupler.res'
if (index(trim(attNameList(i)),'time:calendar') > 0) then
- select case( uppercase(trim(valueS)) )
+ select case( fms_mpp_uppercase(trim(valueS)) )
case( 'JULIAN' )
calendar_type = JULIAN
case( 'GREGORIAN' )
@@ -1349,7 +1349,7 @@
case( 'NO_CALENDAR' )
calendar_type = NO_CALENDAR
case default
- call mpp_error ( FATAL, 'fcst_initialize: calendar must be one of '// &
+ call fms_mpp_error ( FATAL, 'fcst_initialize: calendar must be one of '// &
'JULIAN|GREGORIAN|NOLEAP|THIRTY_DAY|NO_CALENDAR.' )
end select
endif
compiles are successful and about 50 tests fail to compare:
brian.curtis@dlogin03:/lfs/h2/emc/nems/noscrub/brian.curtis/git/FernandoAndrade-NOAA/ufs-weather-model/tests/logs/log_wcoss2> grep -
r FAIL rt_*
rt_001_cpld_control_p8_mixedmode_intel.log:Test 001 cpld_control_p8_mixedmode_intel FAIL Tries: 2
rt_002_cpld_control_gfsv17_intel.log:Test 002 cpld_control_gfsv17_intel FAIL Tries: 2
rt_005_cpld_mpi_gfsv17_intel.log:Test 005 cpld_mpi_gfsv17_intel FAIL Tries: 2
rt_006_cpld_debug_gfsv17_intel.log:Test 006 cpld_debug_gfsv17_intel FAIL Tries: 2
rt_007_cpld_control_p8_intel.log:Test 007 cpld_control_p8_intel FAIL Tries: 2
rt_009_cpld_control_qr_p8_intel.log:Test 009 cpld_control_qr_p8_intel FAIL Tries: 2
rt_011_cpld_2threads_p8_intel.log:Test 011 cpld_2threads_p8_intel FAIL Tries: 2
rt_012_cpld_decomp_p8_intel.log:Test 012 cpld_decomp_p8_intel FAIL Tries: 2
rt_013_cpld_mpi_p8_intel.log:Test 013 cpld_mpi_p8_intel FAIL Tries: 2
rt_014_cpld_control_ciceC_p8_intel.log:Test 014 cpld_control_ciceC_p8_intel FAIL Tries: 2
rt_015_cpld_bmark_p8_intel.log:Test 015 cpld_bmark_p8_intel FAIL Tries: 2
rt_017_cpld_control_noaero_p8_intel.log:Test 017 cpld_control_noaero_p8_intel FAIL Tries: 2
rt_018_cpld_control_nowave_noaero_p8_intel.log:Test 018 cpld_control_nowave_noaero_p8_intel FAIL Tries: 2
rt_019_cpld_control_noaero_p8_agrid_intel.log:Test 019 cpld_control_noaero_p8_agrid_intel FAIL Tries: 2
rt_020_cpld_control_c48_intel.log:Test 020 cpld_control_c48_intel FAIL Tries: 2
rt_021_cpld_control_p8_faster_intel.log:Test 021 cpld_control_p8_faster_intel FAIL Tries: 2
rt_022_cpld_control_pdlib_p8_intel.log:Test 022 cpld_control_pdlib_p8_intel FAIL Tries: 2
rt_025_cpld_debug_pdlib_p8_intel.log:Test 025 cpld_debug_pdlib_p8_intel FAIL Tries: 2
rt_055_regional_noquilt_intel.log:Test 055 regional_noquilt_intel FAIL Tries: 2
rt_074_control_csawmg_intel.log:Test 074 control_csawmg_intel FAIL Tries: 2
rt_075_control_csawmgt_intel.log:Test 075 control_csawmgt_intel FAIL Tries: 2
rt_076_control_ras_intel.log:Test 076 control_ras_intel FAIL Tries: 2
rt_080_control_CubedSphereGrid_debug_intel.log:Test 080 control_CubedSphereGrid_debug_intel FAIL Tries: 2
rt_081_control_wrtGauss_netcdf_parallel_debug_intel.log:Test 081 control_wrtGauss_netcdf_parallel_debug_intel FAIL Tries: 2
rt_082_control_stochy_debug_intel.log:Test 082 control_stochy_debug_intel FAIL Tries: 2
rt_083_control_lndp_debug_intel.log:Test 083 control_lndp_debug_intel FAIL Tries: 2
rt_084_control_csawmg_debug_intel.log:Test 084 control_csawmg_debug_intel FAIL Tries: 2
rt_085_control_csawmgt_debug_intel.log:Test 085 control_csawmgt_debug_intel FAIL Tries: 2
rt_086_control_ras_debug_intel.log:Test 086 control_ras_debug_intel FAIL Tries: 2
rt_087_control_diag_debug_intel.log:Test 087 control_diag_debug_intel FAIL Tries: 2
rt_090_rap_control_debug_intel.log:Test 090 rap_control_debug_intel FAIL Tries: 2
rt_091_hrrr_control_debug_intel.log:Test 091 hrrr_control_debug_intel FAIL Tries: 2
rt_092_hrrr_gf_debug_intel.log:Test 092 hrrr_gf_debug_intel FAIL Tries: 2
rt_093_hrrr_c3_debug_intel.log:Test 093 hrrr_c3_debug_intel FAIL Tries: 2
rt_094_rap_unified_drag_suite_debug_intel.log:Test 094 rap_unified_drag_suite_debug_intel FAIL Tries: 2
rt_095_rap_diag_debug_intel.log:Test 095 rap_diag_debug_intel FAIL Tries: 2
rt_096_rap_cires_ugwp_debug_intel.log:Test 096 rap_cires_ugwp_debug_intel FAIL Tries: 2
rt_097_rap_unified_ugwp_debug_intel.log:Test 097 rap_unified_ugwp_debug_intel FAIL Tries: 2
rt_098_rap_lndp_debug_intel.log:Test 098 rap_lndp_debug_intel FAIL Tries: 2
rt_099_rap_progcld_thompson_debug_intel.log:Test 099 rap_progcld_thompson_debug_intel FAIL Tries: 2
rt_100_rap_noah_debug_intel.log:Test 100 rap_noah_debug_intel FAIL Tries: 2
rt_101_rap_sfcdiff_debug_intel.log:Test 101 rap_sfcdiff_debug_intel FAIL Tries: 2
rt_102_rap_noah_sfcdiff_cires_ugwp_debug_intel.log:Test 102 rap_noah_sfcdiff_cires_ugwp_debug_intel FAIL Tries: 2
rt_103_rrfs_v1beta_debug_intel.log:Test 103 rrfs_v1beta_debug_intel FAIL Tries: 2
rt_104_rap_clm_lake_debug_intel.log:Test 104 rap_clm_lake_debug_intel FAIL Tries: 2
rt_105_rap_flake_debug_intel.log:Test 105 rap_flake_debug_intel FAIL Tries: 2
rt_106_gnv1_c96_no_nest_debug_intel.log:Test 106 gnv1_c96_no_nest_debug_intel FAIL Tries: 2
rt_107_control_wam_debug_intel.log:Test 107 control_wam_debug_intel FAIL Tries: 2
rt_119_rap_control_dyn64_phy32_intel.log:Test 119 rap_control_dyn64_phy32_intel FAIL Tries: 2
@junwang-noaa - the updated explicit naming schema available from use libFMS
allows one to incorporate the FMS infrastructure library with a single module use statement in a manner similar to ESMF, MPI, HDF5, etc. We'd prefer people use the new schema as it clarifies what is FMS provided while also specifying the particular service or manager within the library. Understanding there is a lot of legacy code, one can still use the legacy method of including each individual fortran module as there is no plan at this time only provide access to the libFMS fortran module.
@bensonr Thanks for the clarification. We will use module use fms, and fms_function in the code above.
@FernandoAndrade-NOAA Please add a dependency for: https://github.com/NOAA-EMC/fv3atm/pull/732
@DusanJovic-NOAA can you look that over and give it an approval before Fernando merges it into this PR?
@ulmononian @natalie-perlin @RatkoVasic-NOAA We will start committing #1836 today. After merging in #1836, we need to follow on for spack stack 1.5.1 installation derecho as well. We also need to revisit derecho job_card update issue. I will keep posting.
@ulmononian @natalie-perlin @RatkoVasic-NOAA We will start committing #1836 today. After merging in #1836, we need to follow on for spack stack 1.5.1 installation derecho as well. We also need to revisit derecho job_card update issue. I will keep posting.
@jkbk2004 spack-stack/1.5.1 is installed on derecho. @RatkoVasic-NOAA or @natalie-perlin, can you work w/ @FernandoAndrade-NOAA to ensure the derecho modulefile is updated to use 1.5.1 and the job_card is tested/tweaked (see #2033) for this PR?
@ulmononian I already put derecho-gnu as 1.5.1 (I guess we didn't have 1.5.0 with gnu, only intel). All other machines in this pull request are pointing to 1.5.0 Do you want to up derecho only to 1.5.1? There is another PR for that #2013
@ulmononian I already put derecho-gnu as 1.5.1 (I guess we didn't have 1.5.0 with gnu, only intel). All other machines in this pull request are pointing to 1.5.0 Do you want to up derecho only to 1.5.1? There is another PR for that #2013
@RatkoVasic-NOAA yep -- once #1836 is merged, this PR (#2013) needs to have the derecho intel modulefile updated to 1.5.1. issue #2033 might just need some testing to optimize the fv3.exe run implementation.
Adding a note that Hercules / gnu failed around 30 tests as well, with 6 of those failing abnormally with segmentation faults @RatkoVasic-NOAA fyi:
Tests directory: /work/noaa/epic/nandoam/stmp/nandoam/FV3_RT/rt_1280700/
rt_228_cpld_control_p8_gnu.log:Test 228 cpld_control_p8_gnu FAIL
rt_229_cpld_control_nowave_noaero_p8_gnu.log:Test 229 cpld_control_nowave_noaero_p8_gnu FAIL
rt_230_cpld_debug_p8_gnu.log:Test 230 cpld_debug_p8_gnu FAIL
rt_231_cpld_control_pdlib_p8_gnu.log:Test 231 cpld_control_pdlib_p8_gnu FAIL
rt_232_cpld_debug_pdlib_p8_gnu.log:Test 232 cpld_debug_pdlib_p8_gnu FAIL
rt_233_datm_cdeps_control_cfsr_gnu.log:Test 233 datm_cdeps_control_cfsr_gnu FAIL
Adding a note that Hercules / gnu failed around 30 tests as well, with 6 of those failing abnormally with segmentation faults @RatkoVasic-NOAA fyi:
Tests directory:
/work/noaa/epic/nandoam/stmp/nandoam/FV3_RT/rt_1280700/
rt_228_cpld_control_p8_gnu.log:Test 228 cpld_control_p8_gnu FAIL rt_229_cpld_control_nowave_noaero_p8_gnu.log:Test 229 cpld_control_nowave_noaero_p8_gnu FAIL rt_230_cpld_debug_p8_gnu.log:Test 230 cpld_debug_p8_gnu FAIL rt_231_cpld_control_pdlib_p8_gnu.log:Test 231 cpld_control_pdlib_p8_gnu FAIL rt_232_cpld_debug_pdlib_p8_gnu.log:Test 232 cpld_debug_pdlib_p8_gnu FAIL rt_233_datm_cdeps_control_cfsr_gnu.log:Test 233 datm_cdeps_control_cfsr_gnu FAIL
hera/gnu runs ok but crash happens with mom io on hercules/gnu
159: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
160: at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/config_src/infra/FMS2/MOM_io_infra.F90:905
161: #4 0x3360eb2 in __mom_io_infra_MOD_read_field_2d
169: at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/src/framework/MOM_io.F90:2172
160: #6 0x30cd091 in __mom_shared_initialization_MOD_initialize_topography_from_file
160: at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/src/initialization/MOM_shared_initialization.F90:175
160: #7 0x30b56b0 in __mom_fixed_initialization_MOD_mom_initialize_topography
160: at /work/noaa/epic/jongkim/pr-2013/MOM6-interface/MOM6/src/initialization/MOM_fixed_initialization.F90:224
160: #8 0x30b5a12 in __mom_fixed_initialization_MOD_mom_initialize_fixed
@ulmononian @RatkoVasic-NOAA If issues are with gnu-12 on hercules, we may need to stay with gnu-11.3.1 on hercules. It might be worth to check with new versions of fms/esmf/gftl-shared in spack stack 1.5.0 on hercules. It will allow a sort of cross-check. Can you install them to 1.5.0?
@ulmononian @RatkoVasic-NOAA If issues are with gnu-12 on hercules, we may need to stay with gnu-11.3.1 on hercules. It might be worth to check with new versions of fms/esmf/gftl-shared in spack stack 1.5.0 on hercules. It will allow a sort of cross-check. Can you install them to 1.5.0?
Note that we had to update to gcc@12 due to a bug in mvapich2 with gcc@11
@jkbk2004 do we have any issue with gnu on hera/derecho? If not, can we create a separate issue to debug the gnu issue on Hercules?
@FernandoAndrade-NOAA We're adding a Commit Message requirements to the PRs. Please add one in the space provided.
This doesn't help with the immediate problem you are having, but I wanted to let you know that I just built jedi-bundle on Hercules with GNU in spack-stack-1.5.1, and I was able to run all 2430 ctests without errors. These include tests with the fv3 dycore via fv3-jedi, and tests with mom6 via soca. So it seems that spack-stack-1.5.1 should generally be ok to use, and maybe the problem is a bug in the actual code getting used?
@ulmononian @RatkoVasic-NOAA If issues are with gnu-12 on hercules, we may need to stay with gnu-11.3.1 on hercules. It might be worth to check with new versions of fms/esmf/gftl-shared in spack stack 1.5.0 on hercules. It will allow a sort of cross-check. Can you install them to 1.5.0?
do you also need mapl@2.40.3 for this test env?
Why is there no associated issue with this PR? The only one shown is extending wall-clock for one test.
Please include issue #1854, #1874,
Please include issue #1854, #1874,
These have been added to the linked issues section.
@zach1221 They are not showing up on the right-sidebar, where the issues which get closed automagically appear. I think it maybe because of how you've linked the issues?
PR Author Checklist:
[x] I have linked PR's from all sub-components involved in section below.
[x] I am confirming reviews are completed in ALL sub-component PR's.
[x] I have run the full RT suite on either Hera/Cheyenne AND have attached the log to this PR below this line:
[x] I have added the list of all failed regression tests to "Anticipated changes" section.
[ ] I have filled out all sections of the template.
Description
This PR updates modulefiles to spack-stack 1.5.1. Updates have been made to the
esmf
,fms
,mapl
, andgftl-shared
versions within ufs_common.lua. This also includes a fix for Gaea's timeout with cpld_bmark_p8 noted in #1978.Commit Message
Linked Issues and Pull Requests
Associated UFSWM Issue to close
Subcomponent Pull Requests
Blocking Dependencies
Subcomponents involved:
Anticipated Changes
Input data
Regression Tests:
Tests effected by changes in this PR:
Intel: regional_noquilt control_csawmg control_csawmgt control_ras control_CubedSphereGrid_debug control_wrtGauss_netcdf_parallel_debug control_stochy_debug control_landp_debug control_csawmg_debug control_csawmgt_debug control_ras_debug control_diag_debug rap_control_debug hrrr_control_debug hrrr_gf_debug hrrr_c3_debug rap_unified_drag_suite_debug rap_diag_debug rap_cires_ugwp_debug rap_unified_ugwp_debug rap_lndp_debug rap_progcld_thompson_debug rap_noah_debug rap_sfcdiff_debug rap_noah_sfcdiff_cires_ugwp_debug rrfs_v1beta_debug rap_clm_lake_debug rap_flake_debug gnv1_c96_no_nest_debug control_wam_debug rap_control_dyn64_phy32
Hercules/gnu: control_c48_gnu control_stochy_gnu control_ras_gnu control_flake_gnu rap_control_gnu rap_decomp_gnu rap_2threads_gnu rap_sfcdiff_gnu rap_sfcdiff_decomp_gnu hrrr_control_gnu hrrr_control_noqr_gnu hrrr_control_2threads_gnu hrrr_control_decomp_gnu rrfs_v1beta_gnu rap_control_dyn32_phy32_gnu hrrr_control_dyn32_phy32_gnu rap_2threads_dyn32_phy32_gnu hrrr_control_2threads_dyn32_phy32_gnu hrrr_control_decomp_dyn32_phy32_gnu conus13km_control_gnu rap_control_dyn64_phy32_gnu
cpld_control_p8_gnu cpld_control_nowave_noaero_p8_gnu cpld_debug_p8_gnu cpld_control_pdlib_p8_gnu cpld_debug_pdlib_p8_gnu datm_cdeps_control_cfsr_gnu
Libraries
Code Managers Log
- [x] This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. - [ ] Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems. - [ ] N/A ### Testing Log: - RDHPCS - [x] Hera - [x] Orion - [x] Hercules - [x] Jet - [x] Gaea - [x] Cheyenne - WCOSS2 - [x] Dogwood/Cactus - [ ] Acorn - CI - [x] Completed - opnReqTest - [ ] N/A - [x] Log attached to comment