ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

Cheyenne intel debug tests failing #87

Closed jedwards4b closed 4 years ago

jedwards4b commented 4 years ago

After changing the initial conditions from nemsio 2019-09-09 to grib2 2019-08-29 we are getting failures in the model run with DEBUG enabled at C192 and C384 with both physics packages.
Traceback:

139:MPT: #6  0x00002b093a736d59 in __libm_pow_e7 ()
139:MPT:    from /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.0.0.alpha01/intel-18.0.5/mpt-2.19/lib64/libesmf.so
139:MPT: #7  0x00000000037c59e8 in nst_module::cool_skin (ustar_a=0.25801334064822073, 
139:MPT:     f_nsol=326.29147567951782, f_sol_0=0, evap=68.673170379443576, sss=34, 
139:MPT:     alpha=-9.2072085211872438e-06, beta=0.00079968090216064902, 
139:MPT:     rho_w=1027.3871699771169, rho_a=1.3275131714665556, ts=269.03211181356568, 
139:MPT:     q_ts=27.246728275307337, hl_ts=7.8969886011639865, 
139:MPT:     grav=9.7803384883824531, le=1863393.8950018492, 
139:MPT:     deltat_c=0.35158496314933996, z_c=0.00064651084571022981, c_0=0, c_d=0)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/ccpp/physics/physics/module_nst_model.f90:894
139:MPT: #8  0x000000000324a0e0 in sfc_nst::sfc_nst_run (im=32, hvap=2500000, 
139:MPT:     cp=1004.6, hfus=333580, jcal=4.1855000000000002, eps=0.62199349945828819, 
139:MPT:     epsm1=-0.37800650054171181, rvrdm1=0.60773384427800026, 
139:MPT:     rd=287.05000000000001, rhw0=1022, pi=3.1415926535897931, 
139:MPT:     sbc=5.6704000000000003e-08, ps=..., u1=..., v1=..., t1=..., q1=..., 
139:MPT:     tref=..., cm=..., ch=..., prsl1=..., prslki=..., prsik1=..., prslk1=..., 
139:MPT:     wet=..., xlon=..., sinlat=..., stress=..., sfcemis=..., dlwflx=..., 
139:MPT:     sfcnsw=..., rain=..., timestep=450, kdt=1, solhr=0, xcosz=..., wind=..., 
139:MPT:     flag_iter=..., flag_guess=..., nstf_name1=2, nstf_name4=0, nstf_name5=5, 
139:MPT:     lprnt=.FALSE., ipr=10, tskin=..., tsurf=..., xt=..., xs=..., xu=..., 
139:MPT:     xv=..., xz=..., zm=..., xtts=..., xzts=..., dt_cool=..., z_c=..., c_0=..., 
139:MPT:     c_d=..., w_0=..., w_d=..., d_conv=..., ifd=..., qrain=..., qsurf=..., 
139:MPT:     gflux=..., cmm=..., chh=..., evap=..., hflx=..., ep=..., errmsg=..., 
139:MPT:     errflg=0, .tmp.ERRMSG.len_V$b7=512)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/ccpp/physics/physics/sfc_nst.f:392
139:MPT: #9  0x0000000002bc0492 in ccpp_fv3_gfs_v15p2_physics_cap::fv3_gfs_v15p2_physics_run_cap (con_t0c=273.14999999999998, con_rd=287.05000000000001, 
139:MPT:     rlapse=0.0064999999999999997, con_hvap=2500000, 
139:MPT:     con_eps=0.62199349945828819, con_cliq=4185.5, gfs_control=..., 
139:MPT:     con_g=9.8066499999999994, con_pi=3.1415926535897931, 
139:MPT:     con_sbc=5.6704000000000003e-08, con_epsm1=-0.37800650054171181, 
139:MPT:     huge=9.969209968386869e+36, gfs_interstitial=..., con_hfus=333580, 
139:MPT:     con_cp=1004.6, cdata=..., cimin=0.14999999999999999, 
139:MPT:     gfs_data=<error reading variable: value requires 1997568 bytes, which is more than max-value-size>, con_cvap=1846, con_rv=461.5, 
139:MPT:     con_jcal=4.1855000000000002, con_fvirt=0.60773384427800026, 
139:MPT:     con_tice=271.19999999999999, con_rhw0=1022)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/ccpp/physics/ccpp_FV3_GFS_v15p2_physics_cap.F90:592
139:MPT: #10 0x0000000002b40d1c in ccpp_static_api::ccpp_physics_run (cdata=..., 
139:MPT:     suite_name=..., group_name=..., ierr=0, .tmp.SUITE_NAME.len_V$976a=13, 
139:MPT:     .tmp.GROUP_NAME.len_V$976d=7)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/ccpp/physics/ccpp_static_api.F90:150
139:MPT: #11 0x0000000002b46840 in ccpp_driver::ccpp_step (step=..., nblks=48, ierr=0, 
139:MPT:     .tmp.STEP.len_V$b7b=7)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/ccpp/driver/CCPP_driver.F90:234
139:MPT: #12 0x00000000006430cb in atmos_model_mod::update_atmos_radiation_physics (
139:MPT:     atmos=...)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/atmos_model.F90:364
139:MPT: #13 0x0000000000635c7d in module_fcst_grid_comp::fcst_run_phase_1 (
139:MPT:     fcst_comp=..., importstate=..., exportstate=..., clock=..., rc=0)
139:MPT:     at /glade/scratch/jedwards/SMS_Lh3_D.C192.GFSv15p2.cheyenne_intel.GC.20200212_154145_pny8jh/bld/atm/obj/FV3/module_fcst_grid_comp.F90:708
139:MPT: #14 0x00002b09390f7509 in ESMCI::FTable::callVFuncPtr(char const*, ESMCI::VM*, int*) ()
139:MPT:    from /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.0.0.alpha01/intel-18.0.5/mpt-2.19/lib64/libesmf.so
arunchawla-NOAA commented 4 years ago

@jedwards4b when using the grib2 data set you have to set NSST flag to false. Did you do that? @BinLiu-NOAA can you share the Namelist file that you used for the model run for the C96 configuration?

jedwards4b commented 4 years ago

I don't see any variable anywhere called NSST @uturuncoglu any ideas?

jedwards4b commented 4 years ago

There is a variable convert_nst for chgres - it's false. Is that the one you mean?

BinLiu-NOAA commented 4 years ago

@jedwards4b According to Xu Li, to turn off NSST in the forecast job, you need to change nstf_name = 2,0,0,0,0 into nstf_name = 0,0,0,0,0 in the input.nml.

Of course, for the chgres_cube's namelist file (fort.41), you also need to set convert_nst=.false.

Bin

jedwards4b commented 4 years ago

nstf_name = 2, 1, 1, 0, 5 should I set nstf_name = 0,0,0,0,0 or nstf_name = 0,1,1,0,5

On Wed, Feb 12, 2020 at 6:37 PM Bin Liu notifications@github.com wrote:

@jedwards4b https://github.com/jedwards4b According to Xu Li, to turn off NSST in the forecast job, you need to change nstf_name = 2,0,0,0,0 into nstf_name = 0,0,0,0,0 in the input.nml.

Of course, for the chgres_cube's namelist file (fort.41), you also need to set convert_nst=.false.

Bin

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/87?email_source=notifications&email_token=ABOXUGD33IO73Q6M73TM45LRCSP5HA5CNFSM4KUHPVVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELTBZXQ#issuecomment-585506014, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGCDHCV6H464LTS475DRCSP5HANCNFSM4KUHPVVA .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

BinLiu-NOAA commented 4 years ago

@jedwards4b I would try nstf_name = 0,0,0,0,0

The following may provide the same result though. nstf_name = 0,1,1,0,5

jedwards4b commented 4 years ago

I tried nstf_name = 0,0,0,0,0 that results in an error: 0: ============== 0: final results 0: ============== 0: dbgx --fixratio: F F F F 29: dbgx --scale snwdph from sheleg 334 0.000000000000000E+000 29: 0.377504399924366 61: enter get_nggps_ic is= 25 ie= 48 js= 73 je= 96 isd= 22 ied= 51 jsd= 70 jed= 9961:MPT ERROR: Rank 61(g:61) received signal SIGSEGV(11). 61: Process ID: 47300, Host: r14i1n2, Program: /glade/scratch/jedwards/ufstest/bld/ufs.exe 61: MPT Version: HPE MPT 2.19 02/23/19 05:30:09 61: 61:MPT: --------stack traceback------- 38: dbgx --scale snwdph from sheleg 452 0.000000000000000E+000 38: 9.670420783955239E-002 38: dbgx --scale snwdph from sheleg 453 0.000000000000000E+000 38: 0.257557158834781 38: dbgx --scale snwdph from sheleg 476 0.000000000000000E+000 38: 5.863838849314051E-002 38: dbgx --scale snwdph from sheleg 477 0.000000000000000E+000 38: 1.837636984736003E-002

On Wed, Feb 12, 2020 at 7:54 PM Bin Liu notifications@github.com wrote:

@jedwards4b https://github.com/jedwards4b I would try nstf_name = 0,0,0,0,0

The following may provide the same result though. nstf_name = 0,1,1,0,5

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/87?email_source=notifications&email_token=ABOXUGEJYZGMAMGY5R6PJFDRCSZAFA5CNFSM4KUHPVVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELTGLVI#issuecomment-585524693, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGDHYPB5KALR6XONZC3RCSZAFANCNFSM4KUHPVVA .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

jedwards4b commented 4 years ago

I moved this to #89

arunchawla-NOAA commented 4 years ago

We have reached some idea on how we can proceed with grib2 data and NSST

To use it we have to use nstf_name=2,1,0,0,0,

However this approach still gives us an error in debug mode for grids higher resolution than C96. EMC will try to reproduce the error. We surmise the issue is because the ocean surface temperature is becoming too cold [Note this is an artifact of interpolations in grib2 data and not any physics]. Any temperature colder than 271.2 is not meaningful for NSST.

We are playing with two options to see if they will work

a) Remove NSST as an option by getting rid of it in physics suite definitions

b) Set a lower limit of 271.2 for ocean temperature in chgres (this is reasonable as anything lower than this should no longer be open ocean water where NSST is applied)

Related ticket is https://github.com/ufs-community/ufs-mrweather-app/issues/86

jedwards4b commented 4 years ago

This issue is resolved by PRs NOAA-EMC/fv3atm#67 and ESCOMP/FV3GFS_interface#5 given that the 8 day output is acceptable.

rsdunlapiv commented 4 years ago

@arunchawla-NOAA I moved @jedwards4b NSST and non-NSST runs to Hera for review:

/scratch1/NCEPDEV/nems/Rocky.Dunlap/ufs_xfer/SMS_Ld8.C96.GFSv15p2.cheyenne_intel.grib2
/scratch1/NCEPDEV/nems/Rocky.Dunlap/ufs_xfer/SMS_Ld8.C96.GFSv15p2.cheyenne_intel.grib2_no_NST
arunchawla-NOAA commented 4 years ago

We are closing this issue as no nsst option is running fine and we have not been able to reproduce this error in other platforms