Closed JamesAbeles-NOAA closed 3 years ago
Hmm. This is form the latest commit:
baseline dir = /gpfs/dell2/emc/modeling/noscrub/emc.nemspara/RT/NEMSfv3gfs/develop-20210907/control_c48
working dir = /gpfs/dell2/ptmp/Dom.Heinzeller/FV3_RT/rt_10554/control_c48
Checking test 024 control_c48 results ....
Comparing sfcf000.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf024.nc .........OK
Comparing RESTART/coupler.res .........OK
Comparing RESTART/fv_core.res.nc .........OK
Comparing RESTART/fv_core.res.tile1.nc .........OK
Comparing RESTART/fv_core.res.tile2.nc .........OK
Comparing RESTART/fv_core.res.tile3.nc .........OK
Comparing RESTART/fv_core.res.tile4.nc .........OK
Comparing RESTART/fv_core.res.tile5.nc .........OK
Comparing RESTART/fv_core.res.tile6.nc .........OK
Comparing RESTART/fv_srf_wnd.res.tile1.nc .........OK
Comparing RESTART/fv_srf_wnd.res.tile2.nc .........OK
Comparing RESTART/fv_srf_wnd.res.tile3.nc .........OK
Comparing RESTART/fv_srf_wnd.res.tile4.nc .........OK
Comparing RESTART/fv_srf_wnd.res.tile5.nc .........OK
Comparing RESTART/fv_srf_wnd.res.tile6.nc .........OK
Comparing RESTART/fv_tracer.res.tile1.nc .........OK
Comparing RESTART/fv_tracer.res.tile2.nc .........OK
Comparing RESTART/fv_tracer.res.tile3.nc .........OK
Comparing RESTART/fv_tracer.res.tile4.nc .........OK
Comparing RESTART/fv_tracer.res.tile5.nc .........OK
Comparing RESTART/fv_tracer.res.tile6.nc .........OK
Comparing RESTART/phy_data.tile1.nc .........OK
Comparing RESTART/phy_data.tile2.nc .........OK
Comparing RESTART/phy_data.tile3.nc .........OK
Comparing RESTART/phy_data.tile4.nc .........OK
Comparing RESTART/phy_data.tile5.nc .........OK
Comparing RESTART/phy_data.tile6.nc .........OK
Comparing RESTART/sfc_data.tile1.nc .........OK
Comparing RESTART/sfc_data.tile2.nc .........OK
Comparing RESTART/sfc_data.tile3.nc .........OK
Comparing RESTART/sfc_data.tile4.nc .........OK
Comparing RESTART/sfc_data.tile5.nc .........OK
Comparing RESTART/sfc_data.tile6.nc .........OK
[0] The total amount of wall time = 434.260632
Test 024 control_c48 PASS
If I could get back on Mars, I would post what message I got.
Mars is WCOSS Cray?
On Sep 16, 2021, at 12:32 PM, JamesAbeles-NOAA @.***> wrote:
If I could get back on Mars, I would post what message I got.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/811#issuecomment-921145919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RI3Q44Y25LSMCZKVC3UCIZ5NANCNFSM5EFLLQDA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Yes, I remembered I had a window. Here is what I see: cat fail_test control_c48 001 failed in check_result baseline dir = /gpfs/dell2/emc/modeling/noscrub/emc.nemspara/RT/NEMSfv3gfs/develop-20210907/control_c48 working dir = /gpfs/dell2/ptmp/James.A.Abeles/FV3_RT/rt_5253/control_c48 Checking test 001 control_c48 results .... Comparing sfcf000.nc ............ALT CHECK......NOT OK Comparing sfcf024.nc ............ALT CHECK......NOT OK Comparing atmf000.nc ............ALT CHECK......NOT OK Comparing atmf024.nc ............ALT CHECK......NOT OK Comparing RESTART/coupler.res .........OK Comparing RESTART/fv_core.res.nc .........OK Comparing RESTART/fv_core.res.tile1.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_core.res.tile2.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_core.res.tile3.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_core.res.tile4.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_core.res.tile5.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_core.res.tile6.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_srf_wnd.res.tile1.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_srf_wnd.res.tile2.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_srf_wnd.res.tile3.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_srf_wnd.res.tile4.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_srf_wnd.res.tile5.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_srf_wnd.res.tile6.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_tracer.res.tile1.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_tracer.res.tile2.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_tracer.res.tile3.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_tracer.res.tile4.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_tracer.res.tile5.nc ............ALT CHECK......NOT OK Comparing RESTART/fv_tracer.res.tile6.nc ............ALT CHECK......NOT OK Comparing RESTART/phy_data.tile1.nc ............ALT CHECK......NOT OK Comparing RESTART/phy_data.tile2.nc ............ALT CHECK......NOT OK Comparing RESTART/phy_data.tile3.nc ............ALT CHECK......NOT OK Comparing RESTART/phy_data.tile4.nc ............ALT CHECK......NOT OK Comparing RESTART/phy_data.tile5.nc ............ALT CHECK......NOT OK Comparing RESTART/phy_data.tile6.nc ............ALT CHECK......NOT OK Comparing RESTART/sfc_data.tile1.nc ............ALT CHECK......NOT OK Comparing RESTART/sfc_data.tile2.nc ............ALT CHECK......NOT OK Comparing RESTART/sfc_data.tile3.nc ............ALT CHECK......NOT OK Comparing RESTART/sfc_data.tile4.nc ............ALT CHECK......NOT OK Comparing RESTART/sfc_data.tile5.nc ............ALT CHECK......NOT OK Comparing RESTART/sfc_data.tile6.nc ............ALT CHECK......NOT OK
[0] The total amount of wall time = 433.593927
Test 001 control_c48 FAIL
@JamesAbeles-NOAA, can you try ./rt.sh -k -n control_c48 >&rt.test& ? The -n option has not been tried together with the -l option. The -n option will use rt.conf as default. If needed, using both -n and -l at the same time can be easily implemented in the future.
This may have to do with not using the ecflow for these single jobs (-n option). I am running a test now, and will report back.
I did this: ./rt.sh -k -n control_c48 >&rt.test.1 and it failed again. Same location: /gpfs/dell2/emc/modeling/noscrub/James.A.Abeles/ufs-weather-model/tests/
@JamesAbeles-NOAA I am not able to verify the failure. I git clone
d the latest develop (hash 9007b8
) and invoked ./rt.sh -k -n control_c48 >out 2>&1 &
and it passed: see /gpfs/dell2/emc/modeling/noscrub/Minsuk.Ji/ISS811/tests/RegressionTests_wcoss_dell_p3.log
@JamesAbeles-NOAA please check for any automatically loaded modules in your environment (.bashrc
, .bash_profile
, etc.).
I need to do module purge before running the regression test?
Usually, No. That should be taken care of by NEMS/src/conf/module-setup.sh.inc
I notice that you have modified cmake/Intel.cmake in your source directory.
--- a/cmake/Intel.cmake
+++ b/cmake/Intel.cmake
@@ -29,9 +29,9 @@ else()
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O2 -debug minimal")
set(FAST "-fast-transcendentals")
if(AVX2)
- set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -march=core-avx2")
- set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=core-avx2")
- set(CMAKE_Fortran_FLAGS_OPT "-no-prec-div -no-prec-sqrt -xCORE-AVX2")
+ set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -march=core-avx2 -mtune=core-avx2")
+ set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=core-avx2 -mtune=core-avx2" )
+ set(CMAKE_Fortran_FLAGS_OPT "-no-prec-div -no-prec-sqrt -march=core-avx2 -mtune=core-avx2")
elseif(SIMDMULTIARCH)
set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -axSSE4.2,CORE-AVX2")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -axSSE4.2,CORE-AVX2")
Oh, yes you are correct. I have added mtune since that is what we are recommended to use for performance on wcoss2. Sorry I forgot about that
Let me know how your test goes with the cmake file reverted.
I cloned the latest ufs weather model
git clone -q --recursive https://github.com/ufs-community/ufs-weather-model cd ufs-weather-model/tests ./rt.sh -l rt.conf -k -n control_c48 >&rt.test& It says the test failed. The directory is here: /gpfs/dell2/emc/modeling/noscrub/James.A.Abeles/ufs-weather-model/tests/ The job ran to completion but did not validate