ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

Implement model regression tests with CIME #43

Closed rsdunlapiv closed 4 years ago

rsdunlapiv commented 4 years ago

A basic "smoke test" has been implemented with CIME - this is a kind of sanity check to ensure that the model builds and runs for a short time. We need to identify what additional regression tests would be most beneficial for the release and implement them.

uturuncoglu commented 4 years ago

@rsdunlapiv @jedwards4b The model now supports warm restart through the use of xmlchange (CONTINUE_RUN=TRUE). All changes are pushed to the app.

We need to discuss following,

warm run (+36 hours)

./xmlchange CONTINUE_RUN=TRUE ./xmlchange RUN_STARTDATE=2019-09-10 ./xmlchange RUN_REFDATE=2019-09-10 ./xmlchange START_TOD=43200 ./xmlchange RUN_REFTOD=43200 ./case.submit --only-job case.run

rsdunlapiv commented 4 years ago

Planned CIME tests for the release:

  1. smoke test - performs a single run, compare to locally stored baselines (new platforms require generating baselines the first time via a CIME create_test option). This is complete.
  2. restart test - perform a run of n timesteps writing a restart file at time (n/2)+1. Restart from that point and compare history files at time n - they should be identical. This is in progress.
  3. pe layout test (lower priority) - perform the same run twice but with different processor counts for the atmosphere. History files should be identical.
  4. thread count test - (lower priority) perform the same run twice with same number of MPI tasks but different number of threads - History files should be identical.

A question for @arunchawla-NOAA and @junwang-noaa is whether FV3GFS would be expected to pass tests 2, 3, 4 as described above.

Is there an option to output NetCDF history file in double precision? (@junwang-noaa, @arunchawla-NOAA)

Outstanding Issues

Resolved

junwang-noaa commented 4 years ago

Rocky,

We usually don't do the regression test inside workflow as the workflow actually contains many components and each component has its own regression test. If our goal is to build an end to end system in the future, merging the regression test from components to the workflow will complicate the workflow especially when the DA and other coupled components and downstream jobs are added. It will be a huge workflow that is hard to work with. I am sure what is the purpose of these model test as workflow test. It's just my two cents.

To your question, we do have all the regression tests you listed here. so 2), 3), 4) tests will pass if set up correctly. Please note, for restart test, since some diagnostic bucket accumulated fields are in output surface file, the restart interval has to be the multiple of bucket hours. E.g., if you have bucket hour fhzero=6, so the restart time should be multiple of 6. If you run 24 hour forecast, you can restart at 6, 12, 18, and the model forecast at fh=24 hour will be same as the straight runs.

We do not output history files (netcdf or nemsio) as double precision. But all the restart files (netcdf ) are written out with double precision.

Also I see: Do we need to modify RUN_REFDATE and RUN_STARTDATE automatically based on the date written to the coupler.res?

I am not sure what are the RUN_REFDATE and RUN_STARTDATE. To avoid confusion, StartTime defined in the model configuration is the start time of an integration (including restart). We use CurrTime (current time) to specify the model start time (could be StartTime if starting from beginning, or the restart time specified in the coupled.res). Hope we do not create different set of terminology from model, otherwise it would be error-prone.

uturuncoglu commented 4 years ago

@junwang-noaa Just to be sure, if i understand correctly, we need to set start time in model_configure when we are restarting the model. Right? My observation is that if we keep it as the initial run start time the model crashes with following error,

20200107 112534.133 INFO             PET145 (fv3gfs_cap:InitializeP0) cplprint_flag =      F
20200107 112534.171 WARNING          PET145 ESMCI_Clock.C:1703 ESMCI::Clock::validate() timeStep equals zero.
20200107 112534.171 ERROR            PET145 ESMCI_Clock.C:373 ESMCI::Clock::set() Wrong data value  - Internal subroutine call returned Error
20200107 112534.171 ERROR            PET145 ESMF_Clock.F90:1695 ESMF_ClockSet() Wrong data value  - Internal subroutine call returned Error
20200107 112534.171 ERROR            PET145 fv3_cap.F90:486 Wrong data value  - Passing error in return code
20200107 112534.171 INFO             PET145 Finalizing ESMF
uturuncoglu commented 4 years ago

we could also implement a check to be multiple of fhzero when we are setting restart_interval in model_configure.

junwang-noaa commented 4 years ago

Ufuk,

NO, for restart, you don't need to change the start time, it should be the same time from the initial run. Model will pick up the current time from coupled.res and start from there. It seems something is wrong in your test. Also, please refer the following wiki page for restart:

https://vlab.ncep.noaa.gov/redmine/projects/comfv3/wiki/_set_up_restart_run_for_FV3GFS_

Jun

On Tue, Jan 7, 2020 at 3:13 PM Ufuk Turunçoğlu notifications@github.com wrote:

we could also implement a check to be multiple of fhzero when we are setting restart_interval in model_configure.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AI7D6TOEIDGB2I452NGOXWLQ4TO65A5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIKD4OY#issuecomment-571751995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TOYW6ELUCGSSAYCBN3Q4TO65ANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

@junwang-noaa Thanks for the information. I am using same document to change the namelist options but as i told before, if i not set start time in model configure, the model fails with the error message that i mention in my previous post. I think that model looks for coupler.res file under INPUT directory and try to set the values based on it. Right? Somehow it is not working in my case properly. I'll double check again and let you know.

jedwards4b commented 4 years ago

We do not output history files (netcdf or nemsio) as double precision.

This is a problem for cime testing since it compares history files. If the files are not double precision, then we are not fully testing the system.

junwang-noaa commented 4 years ago

I am curious, so CIME compares history files with what? The operational FV3 model itself is running with 32BIT=Y, it takes too much space to save all the history files in double precision. If the history files in single precision from two runs compared, they should be identical if they expected to.

Jun

On Wed, Jan 8, 2020 at 10:57 AM jedwards4b notifications@github.com wrote:

We do not output history files (netcdf or nemsio) as double precision.

This is a problem for cime testing since it compares history files. If the files are not double precision, then we are not fully testing the system.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AI7D6TMVGUQG4STM3TWQOJDQ4XZVXA5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINBF3I#issuecomment-572134125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TJEL76RUDP5M5AOQSDQ4XZVXANCNFSM4J6DO3VQ .

jedwards4b commented 4 years ago

CIME testing compares history files to baseline history files from previous runs or a test may do two runs and compare results from one run to the results from another - of course we can do this with single precision files but in this case the test may miss roundoff level differences in fields. We do not need to implement double precision history for default output, we would only like to have an option that we can enable for testing purposes.

junwang-noaa commented 4 years ago

Fv3GFS model reproduces with all the tests you listed. The model restart files are written out in real(8) for all the runs (32BIT=Y or 32BIT=N), so the history files always reproduce. You have to change model code to write out real(8) history files, and my understanding is that downstream jobs such as currently NCEP POST can not run with real(8).

On Wed, Jan 8, 2020 at 12:06 PM jedwards4b notifications@github.com wrote:

CIME testing compares history files to baseline history files from previous runs or a test may do two runs and compare results from one run to the results from another - of course we can do this with single precision files but in this case the test may miss roundoff level differences in fields. We do not need to implement double precision history for default output, we would only like to have an option that we can enable for testing purposes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AI7D6TODCGYVBQZIA74EF2DQ4YB2LA5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINIUGQ#issuecomment-572164634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TOTHPNWRAZNGX3Y4D3Q4YB2LANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

@junwang-noaa actually there is no need to run NCEP POST for tests. Just chgres and model.

jedwards4b commented 4 years ago

I propose that for the release we test with 32B history files. But that we consider a new input option to write 64B history files in a later update.

uturuncoglu commented 4 years ago

@junwang-noaa i check again and if i don't change start time, the model fails with the error that i mentioned before. In this case, do wee need to set RUN_CONTINUE to TRUE besides changes in input.nml

uturuncoglu commented 4 years ago

In this case, the log shows following,

0: bf clock_fv3,date=        2019           9          10          12           0
0:           0 date_init=        2019           9           9           0
0:           0           0

So, i think that it reads INPUT/coupler.res correctly and picks the date but i am not sure about the clock error.

uturuncoglu commented 4 years ago

It is complaining about timeStep equals zero. which is the earthStep.

20200108 144122.992 INFO             PET000 (fv3gfs_cap:InitializeP0) cplprint_flag =      F
20200108 144123.019 WARNING          PET000 ESMCI_Clock.C:1703 ESMCI::Clock::validate() timeStep equals zero.
20200108 144123.020 ERROR            PET000 ESMCI_Clock.C:373 ESMCI::Clock::set() Wrong data value  - Internal subroutine call returned Error
20200108 144123.021 ERROR            PET000 ESMF_Clock.F90:1695 ESMF_ClockSet() Wrong data value  - Internal subroutine call returned Error
20200108 144123.021 ERROR            PET000 fv3_cap.F90:486 Wrong data value  - Passing error in return code
20200108 144123.021 INFO             PET000 Finalizing ESMF
uturuncoglu commented 4 years ago

I think i found it, i need to increase the nhours_fcst. Right? For example, if i did 36 hours run and if i want to do another 4 hour. I have to set nhours_fcst to 40. Am i right?

junwang-noaa commented 4 years ago

Yes, that is correct.

On Wed, Jan 8, 2020 at 4:55 PM Ufuk Turunçoğlu notifications@github.com wrote:

I think i found it, i need to increase the nhours_fcst. Right? For example, if i did 36 hours run and if i want to do another 4 hour. I have to set nhours_fcst to 40. Am i right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AI7D6TOZ6HCDPNHSG4PYZJLQ4ZDU5A5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIODW2Y#issuecomment-572275563, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TPCOEMK6SQKAQDDNCDQ4ZDU5ANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

@junwang-noaa @ligiabernardet @arunchawla-NOAA I tested restating model and i could not reproduce the results when i restart the model. I setup two different runs as follow,

ufs-mrweather-app-workflow.c96.base 5 hour run with cold start

ufs-mrweather-app-workflow.c96.rest 3 hour run with cold start 2 hour run with warm start start from the files have prefix 20190909.030000.* (those are copied to INPUT)

In this case, stochastic_physics is turned off and i have only following changes in the input.nml.

diff ufs-mrweather-app-workflow.c96.base/run/input.nml ufs-mrweather-app-workflow.c96.rest/run/input.nml

<   external_ic = .true.
---
>   external_ic = .false.
72c72,73
<   mountain = .false.
---
>   make_nh = .false.
>   mountain = .true.
76c77
<   nggps_ic = .true.
---
>   nggps_ic = .false.
87d87
<   res_latlon_dynamics = ""
91c91
<   warm_start = .false.
---
>   warm_start = .true.
148c148
<   nstf_name = 2, 1, 0, 0, 0
---
>   nstf_name = 2, 0, 0, 0, 0

In this case, i follow the instructions from here and i just wonder that is there anything wrong in my namelist or there might be other needed change in the namelist files.

You could also find my input.nml and model_configure as follows,

input.nml for restart run

&amip_interp_nml
  data_set = "reynolds_oi"
  date_out_of_range = "climo"
  interp_oi_sst = .true.
  no_anom_sst = .false.
  use_ncep_sst = .true.
/
&atmos_model_nml
  blocksize = 32
  ccpp_suite = "FV3_GFS_v15p2"
  fdiag = 1.0
/
&cires_ugwp_nml
  knob_ugwp_azdir = 2, 4, 4, 4
  knob_ugwp_dokdis = 1
  knob_ugwp_effac = 1, 1, 1, 1
  knob_ugwp_ndx4lh = 1
  knob_ugwp_solver = 2
  knob_ugwp_source = 1, 1, 0, 0
  knob_ugwp_stoch = 0, 0, 0, 0
  knob_ugwp_wvspec = 1, 25, 25, 25
  launch_level = 25
/
&coupler_nml
/
&diag_manager_nml
  prepend_date = .false.
/
&external_ic_nml
  levp = 65
/
&fms_io_nml
  checksum_required = .false.
  max_files_r = 100
  max_files_w = 100
/
&fms_nml
  clock_grain = "ROUTINE"
  domains_stack_size = 3000000
/
&fv_core_nml
  a_imp = 1.0
  agrid_vel_rst = .true.
  consv_te = 1.
  d2_bg_k1 = 0.15
  d2_bg_k2 = 0.02
  d4_bg = 0.12
  d_con = 1.
  d_ext = 0.0
  dddmp = 0.1
  delt_max = 0.002
  dnats = 1
  do_sat_adj = .true.
  do_vort_damp = .true.
  external_eta = .true.
  external_ic = .false.
  fill = .true.
  fv_sg_adj = 450
  grid_type = -1
  hord_dp = -5
  hord_mt = 5
  hord_tm = 5
  hord_tr = 8
  hord_vt = 5
  hydrostatic = .false.
  k_split = 2
  kord_mt = 9
  kord_tm = -9
  kord_tr = 9
  kord_wz = 9
  layout = 4, 4
  make_nh = .false.
  mountain = .true.
  n_split = 6
  n_sponge = 10
  na_init = 0
  nggps_ic = .false.
  nord = 2
  npx = 97
  npy = 97
  npz = 64
  ntiles = 6
  nudge_qv = .true.
  nwat = 6
  p_fac = 0.1
  phys_hydrostatic = .false.
  print_freq = 6
  rf_cutoff = 7.5e2
  tau = 10.
  vtdm4 = 0.02
  warm_start = .true.
  z_tracer = .true.
/
&fv_grid_nml
  grid_file = "INPUT/grid_spec.nc"
/
&fv_nwp_nudge_nml
/
&gfdl_cloud_microphysics_nml
  c_cracw = 0.8
  c_paut = 0.5
  c_pgacs = 0.01
  c_psaci = 0.05
  ccn_l = 300.
  ccn_o = 100.
  do_sedi_heat = .false.
  dw_land = 0.16
  fast_sat_adj = .true.
  fix_negative = .true.
  icloud_f = 1
  qi0_crt = 8.0E-5
  ql_mlt = 1.0e-3
  rh_inc = 0.30
  rh_inr = 0.30
  rh_ins = 0.30
  tau_l2v = 225.
  use_ccn = .true.
  vg_max = 12.
  vi_max = 1.
  vs_max = 2.0
  z_slope_ice = .true.
/
&gfs_physics_nml
  cdmbgwd = 0.125, 3.0
  cnvcld = .true.
  cnvgwd = .true.
  dspheat = .true.
  effr_in = .true.
  fhcyc = 24
  fhzero = 6
  h2o_phys = .true.
  hybedmf = .true.
  iaer = 111
  ialb = 1
  ico2 = 2
  iems = 1
  imfdeepcnv = 2
  imfshalcnv = 2
  imp_physics = 11
  isol = 2
  isot = 1
  isubc_lw = 2
  isubc_sw = 2
  ivegsrc = 1
  lgfdlmprad = .true.
  ncld = 5
  nst_anl = .true.
  nstf_name = 2, 0, 0, 0, 0
  oz_phys = .false.
  oz_phys_2015 = .true.
  prautco = 0.00015, 0.00015
  psautco = 0.0008, 0.0005
  redrag = .true.
  shal_cnv = .true.
  trans_trac = .true.
  use_ufo = .true.
/
&interpolator_nml
  interp_method = "conserve_great_circle"
/
&mpp_io_nml
/
&nam_physics_nml
/
&nam_sfcperts
/
&nam_stochy
  iseed_shum = 2020011011012
  iseed_skeb = 2020011011011
  iseed_sppt = 2020011011013
  lat_s = 768
  lon_s = 1536
  ntrunc = 766
  shum = 0.005
  shum_lscale = 500000.0
  shum_tau = 21600.0
  skeb = 0.3
  skeb_lscale = 500000.0
  skeb_npass = 30
  skeb_tau = 21600.0
  sppt = 0.5
  sppt_logit = .true.
  sppt_lscale = 500000.0
  sppt_sfclimit = .true.
  sppt_tau = 21600.0
  use_zmtnblck = .true.
/
&namsfc
  fabsl = 99999
  faisl = 99999
  faiss = 99999
  fnabsc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_mxsnoalb.uariz.t190.384.192.rg.grb"
  fnaisc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/CFSR.SEAICE.1982.2012.monthly.clim.grb"
  fnalbc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_snowfree_albedo.bosu.t190.384.192.rg.grb"
  fnalbc2 = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_albedo4.1x1.grb"
  fnglac = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_glacier.2x2.grb"
  fnmskh = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_slmask.t1534.3072.1536.grb"
  fnmxic = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_maxice.2x2.grb"
  fnslpc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_slope.1x1.grb"
  fnsmcc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_soilmgldas.statsgo.t1534.3072.1536.grb"
  fnsnoc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_snoclim.1.875.grb"
  fnsotc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_soiltype.statsgo.t190.384.192.rg.grb"
  fntg3c = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_tg3clim.2.6x1.5.grb"
  fntsfc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/RTGSST.1982.2012.monthly.clim.grb"
  fnvegc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_vegfrac.0.144.decpercent.grb"
  fnvetc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_vegtype.igbp.t190.384.192.rg.grb"
  fnvmnc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_shdmin.0.144x0.144.grb"
  fnvmxc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_shdmax.0.144x0.144.grb"
  fnzorc = "igbp"
  fsicl = 99999
  fslpl = 99999
  fsnol = 99999.0
  fsotl = 99999.0
  ftsfs = 90.0
  fvetl = 99999
  fvmnl = 99999
  fvmxl = 99999
/
&nest_nml
/
&surf_map_nml
/
&test_case_nml
/

model_configure for restart run

ENS_SPS: .false.
PE_MEMBER01: 108
RUN_CONTINUE: .false.
atmos_nthreads: 1
calendar: julian
cpl: .false.
dt_atmos: 450
filename_base: atm sfc
iau_offset: 0
ideflate: 1
imo: 384
jmo: 192
memuse_verbose: .false.
nbits: 14
ncores_per_node: 36
nfhmax_hf: 12
nfhout: 3
nfhout_hf: 1
nhours_fcst: 5
nsout: -1
num_files: 2
output_1st_tstep_rst: .false.
output_file: 'netcdf'
output_grid: gaussian_grid
output_history: .true.
print_esmf: .false.
quilting: .true.
restart_interval: 3
start_day: 9
start_hour: 0
start_minute: 0
start_month: 9
start_second: 0
start_year: 2019
total_member: 1
use_hyper_thread: .false.
write_dopost: .false.
write_fsyncflag: .true.
write_groups: 1
write_nemsioflip: .true.
write_tasks_per_group: 12
arunchawla-NOAA commented 4 years ago

Use a 6 hour restart file. Precip buckets get flushed every 6 hours. As Jun pointed to me in a different discussion :)

Arun Chawla Chief Engineering & Implementation Branch Room 2083 National Center for Weather & Climate Prediction 5830 University Research Court College Park, MD 20740 Ph: 301-683-3740 Fx: 301-683-3703

On Fri, Jan 10, 2020 at 1:48 PM Ufuk Turunçoğlu notifications@github.com wrote:

@junwang-noaa https://github.com/junwang-noaa @ligiabernardet https://github.com/ligiabernardet @arunchawla-NOAA https://github.com/arunchawla-NOAA I tested restating model and i could not reproduce the results when i restart the model. I setup two different runs as follow,

ufs-mrweather-app-workflow.c96.base 5 hour run with cold start

ufs-mrweather-app-workflow.c96.rest 3 hour run with cold start 2 hour run with warm start start from the files have prefix 20190909.030000.* (those are copied to INPUT)

In this case, stochastic_physics is turned off and i have only following changes in the input.nml.

diff ufs-mrweather-app-workflow.c96.base/run/input.nml ufs-mrweather-app-workflow.c96.rest/run/input.nml

< external_ic = .true.

external_ic = .false. 72c72,73 < mountain = .false.

make_nh = .false. mountain = .true. 76c77 < nggps_ic = .true.

nggps_ic = .false. 87d87 < res_latlon_dynamics = "" 91c91 < warm_start = .false.

warm_start = .true. 148c148 < nstf_name = 2, 1, 0, 0, 0

nstf_name = 2, 0, 0, 0, 0

In this case, i follow the instructions from here https://vlab.ncep.noaa.gov/redmine/projects/comfv3/wiki/_set_up_restart_run_for_FV3GFS_ and i just wonder that is there anything wrong in my namelist or there might be other needed change in the namelist files.

You could also find my input.nml and model_configure as follows,

input.nml for restart run

&amip_interp_nml data_set = "reynolds_oi" date_out_of_range = "climo" interp_oi_sst = .true. no_anom_sst = .false. use_ncep_sst = .true. / &atmos_model_nml blocksize = 32 ccpp_suite = "FV3_GFS_v15p2" fdiag = 1.0 / &cires_ugwp_nml knob_ugwp_azdir = 2, 4, 4, 4 knob_ugwp_dokdis = 1 knob_ugwp_effac = 1, 1, 1, 1 knob_ugwp_ndx4lh = 1 knob_ugwp_solver = 2 knob_ugwp_source = 1, 1, 0, 0 knob_ugwp_stoch = 0, 0, 0, 0 knob_ugwp_wvspec = 1, 25, 25, 25 launch_level = 25 / &coupler_nml / &diag_manager_nml prepend_date = .false. / &external_ic_nml levp = 65 / &fms_io_nml checksum_required = .false. max_files_r = 100 max_files_w = 100 / &fms_nml clock_grain = "ROUTINE" domains_stack_size = 3000000 / &fv_core_nml a_imp = 1.0 agrid_vel_rst = .true. consv_te = 1. d2_bg_k1 = 0.15 d2_bg_k2 = 0.02 d4_bg = 0.12 d_con = 1. d_ext = 0.0 dddmp = 0.1 delt_max = 0.002 dnats = 1 do_sat_adj = .true. do_vort_damp = .true. external_eta = .true. external_ic = .false. fill = .true. fv_sg_adj = 450 grid_type = -1 hord_dp = -5 hord_mt = 5 hord_tm = 5 hord_tr = 8 hord_vt = 5 hydrostatic = .false. k_split = 2 kord_mt = 9 kord_tm = -9 kord_tr = 9 kord_wz = 9 layout = 4, 4 make_nh = .false. mountain = .true. n_split = 6 n_sponge = 10 na_init = 0 nggps_ic = .false. nord = 2 npx = 97 npy = 97 npz = 64 ntiles = 6 nudge_qv = .true. nwat = 6 p_fac = 0.1 phys_hydrostatic = .false. print_freq = 6 rf_cutoff = 7.5e2 tau = 10. vtdm4 = 0.02 warm_start = .true. z_tracer = .true. / &fv_grid_nml grid_file = "INPUT/grid_spec.nc" / &fv_nwp_nudge_nml / &gfdl_cloud_microphysics_nml c_cracw = 0.8 c_paut = 0.5 c_pgacs = 0.01 c_psaci = 0.05 ccn_l = 300. ccn_o = 100. do_sedi_heat = .false. dw_land = 0.16 fast_sat_adj = .true. fix_negative = .true. icloud_f = 1 qi0_crt = 8.0E-5 ql_mlt = 1.0e-3 rh_inc = 0.30 rh_inr = 0.30 rh_ins = 0.30 tau_l2v = 225. use_ccn = .true. vg_max = 12. vi_max = 1. vs_max = 2.0 z_slope_ice = .true. / &gfs_physics_nml cdmbgwd = 0.125, 3.0 cnvcld = .true. cnvgwd = .true. dspheat = .true. effr_in = .true. fhcyc = 24 fhzero = 6 h2o_phys = .true. hybedmf = .true. iaer = 111 ialb = 1 ico2 = 2 iems = 1 imfdeepcnv = 2 imfshalcnv = 2 imp_physics = 11 isol = 2 isot = 1 isubc_lw = 2 isubc_sw = 2 ivegsrc = 1 lgfdlmprad = .true. ncld = 5 nst_anl = .true. nstf_name = 2, 0, 0, 0, 0 oz_phys = .false. oz_phys_2015 = .true. prautco = 0.00015, 0.00015 psautco = 0.0008, 0.0005 redrag = .true. shal_cnv = .true. trans_trac = .true. use_ufo = .true. / &interpolator_nml interp_method = "conserve_great_circle" / &mpp_io_nml / &nam_physics_nml / &nam_sfcperts / &nam_stochy iseed_shum = 2020011011012 iseed_skeb = 2020011011011 iseed_sppt = 2020011011013 lat_s = 768 lon_s = 1536 ntrunc = 766 shum = 0.005 shum_lscale = 500000.0 shum_tau = 21600.0 skeb = 0.3 skeb_lscale = 500000.0 skeb_npass = 30 skeb_tau = 21600.0 sppt = 0.5 sppt_logit = .true. sppt_lscale = 500000.0 sppt_sfclimit = .true. sppt_tau = 21600.0 use_zmtnblck = .true. / &namsfc fabsl = 99999 faisl = 99999 faiss = 99999 fnabsc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_mxsnoalb.uariz.t190.384.192.rg.grb" fnaisc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/CFSR.SEAICE.1982.2012.monthly.clim.grb" fnalbc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_snowfree_albedo.bosu.t190.384.192.rg.grb" fnalbc2 = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_albedo4.1x1.grb" fnglac = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_glacier.2x2.grb" fnmskh = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_slmask.t1534.3072.1536.grb" fnmxic = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_maxice.2x2.grb" fnslpc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_slope.1x1.grb" fnsmcc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_soilmgldas.statsgo.t1534.3072.1536.grb" fnsnoc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_snoclim.1.875.grb" fnsotc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_soiltype.statsgo.t190.384.192.rg.grb" fntg3c = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_tg3clim.2.6x1.5.grb" fntsfc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/RTGSST.1982.2012.monthly.clim.grb" fnvegc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_vegfrac.0.144.decpercent.grb" fnvetc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_vegtype.igbp.t190.384.192.rg.grb" fnvmnc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_shdmin.0.144x0.144.grb" fnvmxc = "/glade/p/cesmdata/cseg/ufs_inputdata/global/fix/fix_am.v20191213/global_shdmax.0.144x0.144.grb" fnzorc = "igbp" fsicl = 99999 fslpl = 99999 fsnol = 99999.0 fsotl = 99999.0 ftsfs = 90.0 fvetl = 99999 fvmnl = 99999 fvmxl = 99999 / &nest_nml / &surf_map_nml / &test_case_nml /

model_configure for restart run

ENS_SPS: .false. PE_MEMBER01: 108 RUN_CONTINUE: .false. atmos_nthreads: 1 calendar: julian cpl: .false. dt_atmos: 450 filename_base: atm sfc iau_offset: 0 ideflate: 1 imo: 384 jmo: 192 memuse_verbose: .false. nbits: 14 ncores_per_node: 36 nfhmax_hf: 12 nfhout: 3 nfhout_hf: 1 nhours_fcst: 5 nsout: -1 num_files: 2 output_1st_tstep_rst: .false. output_file: 'netcdf' output_grid: gaussian_grid output_history: .true. print_esmf: .false. quilting: .true. restart_interval: 3 start_day: 9 start_hour: 0 start_minute: 0 start_month: 9 start_second: 0 start_year: 2019 total_member: 1 use_hyper_thread: .false. write_dopost: .false. write_fsyncflag: .true. write_groups: 1 write_nemsioflip: .true. write_tasks_per_group: 12

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AL5NYIY4QYPGCNF5OFU2IELQ5C7JTA5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIU3QRA#issuecomment-573159492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL5NYI2IPMAT6ID3NGNNSSDQ5C7JTANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

@arunchawla-NOAA Okay. Thanks. I'll try it and let you know.

uturuncoglu commented 4 years ago

@arunchawla-NOAA I tried with 6 hour interval but there are still difference.

uturuncoglu commented 4 years ago

As i mentioned before, the experiment that i try to run as follows

ufs-mrweather-app-workflow.c96.base 5 hour run with cold start

ufs-mrweather-app-workflow.c96.rest 3 hour run with cold start 2 hour run with warm start (start from the files that is produced by 3 hour run and have prefix 20190909.030000.* and those files are copied to INPUT without time prefix)

In this case, i am trying to compare last 2 hours output. In theory they must be identical if restart is handled as bit-to-bit. I am not sure but is this case really tested with regression tests? As i know, the development of FV3 is also continue along with the release and if you test this experiment with CCPP suits and if it creates bit-to-bit results as i explained previously, just let me know and there might be some missing namelist option that we are not aware.

uturuncoglu commented 4 years ago

The result is also same if i set restart_interval to 6 and design experiment like following

ufs-mrweather-app-workflow.c96.base 9 hour run with cold start

ufs-mrweather-app-workflow.c96.rest 6 hour run with cold start 3 hour run with warm start (start from the files that is produced by 6 hour run and have prefix 20190909.060000.* and those files are copied to INPUT without time prefix)

I'll also try to make few days run and test it

climbfuji commented 4 years ago

I'd say we should test this first "manually" or as part of the ufs-weather-model regression tests run on hera/cheyenne. The code "should" be b4b identical through restarts, but I haven't tested it (there are several restart tests in the develop branch which test various bits and pieces of the physics contained in these suites. v15p2 or v16?

arunchawla-NOAA commented 4 years ago

Get the testing group with Phil Pegion involved

Sent from my iPhone

On Jan 12, 2020, at 6:38 AM, Dom Heinzeller notifications@github.com wrote:

 I'd say we should test this first "manually" or as part of the ufs-weather-model regression tests run on hera/cheyenne. The code "should" be b4b identical through restarts, but I haven't tested it (there are several restart tests in the develop branch which test various bits and pieces of the physics contained in these suites. v15p2 or v16?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rsdunlapiv commented 4 years ago

All SMS tests are implemented on Cheyenne/intel.
@uturuncoglu to change iseed_* to a constant to fix NLCOMP failures @jedwards4b is working on RUN failures in debug mode likely due to threading issue with CCPP

After Cheyenne/intel/mpt:

@rsdunlapiv needs to push changes for Hera to CIME

rsdunlapiv commented 4 years ago

@pjpegion do you have an account on Cheyenne? That could be the first testing target for you.

pjpegion commented 4 years ago

@rsdunlapiv yes I do. Do I just checkout ufs-mrweather-app and follow directions? I'm not familiar with CIME.

jedwards4b commented 4 years ago

@pjpegion We aren't quite ready yet - we need to push one more update to the ufs-mrweather-app and will provide quick start guide.

ligiabernardet commented 4 years ago

@jedwards4b @uturuncoglu Can you please provide an update on the readiness of the ufs-mrweather-app and associated CIME quick-start visible at https://ufs-mrapp.readthedocs.io/en/latest/index.html? Are the basics ready for the documentation/test folks to run and see how it works?

uturuncoglu commented 4 years ago

It is pretty ready and we still need to put couple of more information to the documentation. BTW, my last push to documentation branch does not appear in the web, there might be an issue related to updating Sphinx.

ligiabernardet commented 4 years ago

Ufuk, It is great to hear that the code is ready for initial testing.

The Sphinx build in RTD is failing with message: "ImportError: cannot import name 'PackageFinder' from 'pip._internal.index'. I am triggering a manual build to see if it passes. Also need to check that there is some problem in the revised Shinx files that causes the error.

Proactive steps:

  1. Julie, Laurie and I will now receive email notifications on build failures so we can help address timely.
  2. I would like to add Ufuk (and others? someone from EMC?) as administrators this project on RTD. That way, when a build fails, you can see the error message and we can collectively address any issues in the most timely manner. Ufu, would you like to create an account on readthedocs.org and send me your username?

zsxz

On Fri, Jan 24, 2020 at 11:33 AM Ufuk Turunçoğlu notifications@github.com wrote:

It is pretty ready and we still need to put couple of more information to the documentation. BTW, my last push to documentation branch does not appear in the web, there might be an issue related to updating Sphinx.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AE7WQAXYDOS7CFVRJFK5PBTQ7MX7BA5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3VY5A#issuecomment-578247796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQATP54HSDF7WCIUC4ETQ7MX7BANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

Okay Thanks. BTW, my user name for readthedocs.org is turuncoglu

ligiabernardet commented 4 years ago

Ufuk, I added you as an administrator of the MRW APP UG on RTD. The build failure you are experiencing is not specific to RTD. The HTML is not building by hand on my laptop after your last code changes. I think the changes to conf.py may be one of the reasons (not sure). One of the culprits imay be the newly-introduced to dependency on sphinx_rtd_theme needs to be dealt with. Try to add it to requirements.txt.

On Fri, Jan 24, 2020 at 12:26 PM Ufuk Turunçoğlu notifications@github.com wrote:

Okay Thanks. BTW, my user name for readthedocs.org is turuncoglu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AE7WQAUY3J45RRJ3OSGG4F3Q7M6GFA5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ32NPY#issuecomment-578266815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQARXJOMQXSO52N45ZQTQ7M6GFANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

Okay. Thanks. Actually it was compiling on my mac without any error. I'll add it and push it again.

uturuncoglu commented 4 years ago

I edit requirements.txt and conf.py also but it still fails. Do you have any idea? I might implement something wrong but i am not sure. I don't have too much experience with Sphinx.

JulieSchramm commented 4 years ago

There was a buggy version of pip 20.0 that was immediately followed by a hotfix 20.0.1. I wiped out the environment on readthedocs for the app build and it seems to build now. The info was at: https://stackoverflow.com/questions/59846065/read-the-docs-build-fails-with-cannot-import-name-packagefinder-from-pip-in

On Fri, Jan 24, 2020 at 1:16 PM Ufuk Turunçoğlu notifications@github.com wrote:

I edit requirements.txt and conf.py also but it still fails. Do you have any idea? I might implement something wrong but i am not sure. I don't have too much experience with Sphinx.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=AA3WNU6TZYZQDGWGEHQ326LQ7NEA5A5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ36XRA#issuecomment-578284484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WNUZWMVJ3OFYAZKIZSULQ7NEA5ANCNFSM4J6DO3VQ .

uturuncoglu commented 4 years ago

@JulieSchramm Thanks for your help. It seems fine now.

rsdunlapiv commented 4 years ago

@pjpegion we are ready for you to do some preliminary testing of the MR Weather App using the CIME workflow. At this point please test only the two "preconfigured" platforms, Cheyenne and Stampede2, since we do not yet have documentation for how to add a new preconfigured platform.

I would recommend specifically starting with these two sections of the Quick Start Guide: https://ufs-mrapp.readthedocs.io/en/latest/quickstart/quickstart.html https://ufs-mrapp.readthedocs.io/en/latest/quickstart/testing.html

We have implemented several kinds of CIME regression tests that will be available for the release. We are tracking current status of those tests here: https://docs.google.com/spreadsheets/d/1DnIDqZZ4toXmUmHxAs98STV19BO8TvYt1d6gkSjodrU/edit#gid=0 Please be aware that there are some known test failures - we will keep the spreadsheet updated.

Please keep us posted as you begin testing this part of the release.

pjpegion commented 4 years ago

@rsdunlapiv I started testing this morning. So far I managed to run a 5-day forecast. I will go through the lists of tests and get back to you. Only comment so far is in the quick start guide, you should mention that you have to cd to cime/scripts to create a case.

pjpegion commented 4 years ago

@rsdunlapiv My first attempt of running one of the other smoke test case fails. This case's run directory is /glade/scratch/pegion/SMS_D_Lh5.C96.GFSv15p2.cheyenne_intel.try/run

I compared this with my successful 5-day run following create_new case in the quick start guide, which is in /glade/scratch/pegion/test1/run, and I don't see anything that would cause the crash, but I do notice that the executables for the two tests are much different sizes.

pegion@cheyenne4:/glade/scratch/pegion/test1> ls -l /glade/scratch/pegion//bld/*exe -rwxr-xr-x 1 pegion ncar 120586424 Jan 27 07:43 /glade/scratch/pegion/SMS_D_Lh5.C96.GFSv15p2.cheyenne_intel.try/bld/ufs.exe -rwxr-xr-x 1 pegion ncar 31092664 Jan 27 07:05 /glade/scratch/pegion/test1/bld/ufs.exe

jedwards4b commented 4 years ago

@pjpegion as noted in the spreadsheet that test is a known failure.
As for why the size is significantly different than your case created with create_newcase, the difference is in DEBUG mode which is enabled in the test but not in the other case.

 /glade/scratch/pegion/test1
:) ./xmlquery DEBUG
    DEBUG: FALSE
pjpegion commented 4 years ago

@jedwards4b ok, thanks for that info. I'm running other cases now.

uturuncoglu commented 4 years ago

I edited the documentation and add cd cime/scripts to create a case.

pjpegion commented 4 years ago

@jedwards4b @rsdunlapiv All of the cases that are labeled as running in the spread sheet pass for the intel compiler. I have 2 questions and 1 comment. Questions:

  1. I would like to intentionally break one of the runs (by changing the timestep) and see if it gets caught. How do I change the time-step of the model (I cannot figure where dt_atmos gets defined).
  2. How do I change to testing the gnu compiler? Comment:
  3. The C96 model should be using a longer time-step that 450 seconds.

Thanks, Phil

jedwards4b commented 4 years ago

On Tue, Jan 28, 2020 at 7:42 AM Phil Pegion notifications@github.com wrote:

@jedwards4b https://github.com/jedwards4b @rsdunlapiv https://github.com/rsdunlapiv All of the cases that are labeled as running in the spread sheet pass for the intel compiler. I have 2 questions and 1 comment. Questions:

  1. I would like to intentionally break one of the runs (by changing the timestep) and see if it gets caught. How do I change the time-step of the model (I cannot figure where dt_atmos gets defined).

     You can change this in user_nl_atm, add the line:

    dt_atmos = 900 (or whatever)

then submit the job, if you want to confirm the change before you submit run ./preview_namelists then check the run directory.

  1. How do I change to testing the gnu compiler?

        In creating  a case it's a command line argument: --compiler

    gnu In a test you can add it to the testname: SMS_Lh5.C96.GFSv15p2.cheyenne_gnu

Comment

  1. The C96 model should be using a longer time-step that 450 seconds.

We are using namelist settings provided to us, we can change them but it would be nice to have an authoritative list of default settings.

Thanks, Phil

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/43?email_source=notifications&email_token=ABOXUGHWVZW5SBZPVSUDIN3RAA76LA5CNFSM4J6DO3V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKDRKGI#issuecomment-579278105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGBLAO422QLZQBHXSYDRAA76LANCNFSM4J6DO3VQ .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

uturuncoglu commented 4 years ago

@pjpegion The default values are provided by @KateFriedman-NOAA and @ligiabernardet with following google docs for both CCPP v15p2 and v16beta

https://docs.google.com/document/d/1EKc2mAld5VsrNjTRgqUcTVG1ZcEIkllA-NrAKUs4DWI/edit https://docs.google.com/document/d/1bLbVdWgEIknDQZgTuOZ6IPVEGv5jUgOrCm4GrR96oBU/edit

As @jedwards4b mentioned about it, it would be nice to have a consensus about the default namelist values and then we could make them default.

pjpegion commented 4 years ago

@jedwards4b Thanks. I did a test with a different time step, and the test fails as I expected.
I will now move to the gnu compiler tests.

rsdunlapiv commented 4 years ago

@junwang-noaa and @GeorgeGayno-NOAA I am trying to complete to port of CIME to Hera. Can you please let me know where the ECM_post executable and chgres_cube executable are located? Are these up to date with the release branch versions?