Closed jedwards4b closed 4 years ago
I am currently working on getting the restart tests into the rt.sh regression test system, also because this was reported independently in https://github.com/NOAA-EMC/fv3atm/issues/42. It would make sense to wait for the rt.sh based tests to be implemented before spending more time on this.
@climbfuji Please let me know if you want me to do anything.
I can confirm that with the namelist settings in the ufs_public_release branches for the GFS_v15p2 tests the restarts do not work. I am now trying to fix this, I've got a few ideas what may be the difference to the tests that we know are b4b reproducible in restart runs.
@jedwards4b @mcgibbon I have a solution for this (tested on my Mac for GFSv15p2 thus far). The default namelist settings for both GFSv15p2 and GFSv16beta in the ufs_public_release branch of the ufs-weather-model repository turn on skep, shum and sppt. The stochastic physics do not reproduce in restart runs, because the logic for dealing with restarts hasn't been implemented in the stochastic_physics repo (@pjpegion) and the model isn't writing those fields to the restart files (@DusanJovic-NOAA @junwang-noaa). My suggestion for the public release is to (a) turn off stochastic physics in the default namelists (Phil suggested this anyway, but I missed it) and (b) document that using the stochastic perturbations is an advanced feature that currently does not support b4b identical results through restarts (@ligiabernardet). For our development branches, we need to implement this capability in stochastic_physics and fv3atm in the near future. Any objections?
I believe that we have already made this change for cime tests and it still fails.
On Fri, Jan 17, 2020, 09:10 Dom Heinzeller notifications@github.com wrote:
@jedwards4b https://github.com/jedwards4b @mcgibbon https://github.com/mcgibbon I have a solution for this (tested on my Mac for GFSv15p2 thus far). The default namelist settings for both GFSv15p2 and GFSv16beta in the ufs_public_release branch of the ufs-weather-model repository turn on skep, shum and sppt. The stochastic physics do not reproduce in restart runs, because the logic for dealing with restarts hasn't been implemented in the stochastic_physics repo (@pjpegion https://github.com/pjpegion) and the model isn't writing those fields to the restart files (@DusanJovic-NOAA https://github.com/DusanJovic-NOAA @junwang-noaa https://github.com/junwang-noaa). My suggestion for the public release is to (a) turn off stochastic physics in the default namelists (Phil suggested this anyway, but I missed it) and (b) document that using the stochastic perturbations is an advanced feature that currently does not support b4b identical results through restarts (@ligiabernardet https://github.com/ligiabernardet). For our development branches, we need to implement this capability in stochastic_physics and fv3atm in the near future. Any objections?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/62?email_source=notifications&email_token=ABOXUGEGY72BFCXT2OUQ7Q3Q6HJ7VA5CNFSM4KHW334KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIFJLI#issuecomment-575689901, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGEMND3FHPQP7J7NJOTQ6HJ7VANCNFSM4KHW334A .
The default configurations for this release are with all stochastic processes turned off. @climbfuji: can you produce b4b restarts with stochastics off?
@jedwards4b Yes, i confirm that. We turned off stochastic physics but it was still not b4b.
On my Mac, I am getting b4b identical results w/o stochastic physics. Now testing on Cheyenne with Intel.
Just to make sure that you are modifying the nstf_name namelist entry as well for the restart runs? The usual regression tests for ufs-weather-model use 2,1,1,0,5 for coldstarts. When restarting, one needs to set the second 1 to 0 (that is the NSST spinup flag, one of the "hidden features" - don't blame me). The input.nml we got from EMC uses 2,1,0,0,0 for coldstarts. I am testing now if 2,0,0,0,0 works for restarts or if we need to switch to "2,1,1,0,5" and "2,0,1,0,5". Just be patient, please.
Dom,
This feature is not hidden, please see the document:
https://vlab.ncep.noaa.gov/redmine/projects/comfv3/wiki/_set_up_restart_run_for_FV3GFS_
On Fri, Jan 17, 2020 at 12:02 PM Dom Heinzeller notifications@github.com wrote:
Just to make sure that you are modifying the nstf_name namelist entry as well for the restart runs? The usual regression tests for ufs-weather-model use 2,1,1,0,5 for coldstarts. When restarting, one needs to set the second 1 to 0 (that is the NSST spinup flag, one of the "hidden features" - don't blame me). The input.nml we got from EMC uses 2,1,0,0,0 for coldstarts. I am testing now if 2,0,0,0,0 works for restarts or if we need to switch to "2,1,1,0,5" and "2,0,1,0,5". Just be patient, please.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/62?email_source=notifications&email_token=AI7D6TMPLTIO22KVHM5BQJDQ6HQCLA5CNFSM4KHW334KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIKOVQ#issuecomment-575711062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TL2NJD5OWJYQTDICDDQ6HQCLANCNFSM4KHW334A .
I agree, Jun, it is not hidden to people who have access to Vlab. I am not sure if it is in the ufs-weather-model documentation for the release (I am lost wrt documentation) and I am not sure if the CIME folks know about it ... let's wait to hear from them!
@climbfuji I have already did it. My previous tests are on
Base run: /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.base/run nstf_name = 2, 1, 0, 0, 0
Restart run: /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.rest/run nstf_name = 2, 0, 0, 0, 0
By default, stochastic physics is off.
That's good to know, thanks. if that fails I will test the default 2,1,1,0,5 settings. Just wait, please.
@climbfuji when you get things working, could you attach an input.nml
which is working locally for you? I'd like to test it on my system. I would just ask which options disable skep, shum, and sppt but I can see those are disabled in my log file. I'd like to glance at whatever else might be different in my set-up.
Sure. But note everyone that I will be taking this weekend off (definitely Sunday and Monday), so please don't expect any answers before Tuesday. Thanks ...
Everyone, please see here https://github.com/NOAA-EMC/fv3atm/issues/42 for the solution/namelists/... Thanks!
@climbfuji I tried the cime test with these changes, it still fails.
I am using settings:
nstf_name = 2, 1, 1, 0, 5
for the initial run and
nstf_name = 2, 0, 1, 0, 5
for the restart run. I also updated the stochastic physics source.
My source tree is /glade/u/home/jedwards/sandboxes/ufs-mrweather-app
and the test is in /glade/scratch/jedwards/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.20200119_103112_odlpjt
I don't think I have the time to look at the differences between your runs and mine today. Here is a copy of all the directories you need on Cheyenne:
/glade/work/heinzell/fv3/rundirs_for_cime_restart_issues/
You will be interested in the following directories:
fv3_ccpp_gfs_v15p2_prod # 0-48h fcst
fv3_ccpp_gfs_v15p2_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v15p2_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT
fv3_ccpp_gfs_v16beta_prod # 0-48h fcst
fv3_ccpp_gfs_v16beta_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v16beta_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT
I am beginning to wonder if this is related to the debug-run problems you have been seeing, i.e. the missing update to the ufs_release_v1.0 branch for chgres_cube from George Gayno and the missing compiler flags for the GNU compiler for this executable.
This test is using the Intel compiler so I'm not sure what GNU would have to do with it. The biggest difference I see is that you are using the cubed_sphere_grid for output_grid and I am using gaussian_grid . I'm looking into this now.
The same tests passed with the GNU compilers as well. They are identical except the modules.fv3 files. I can rerun the tests on Cheyenne with GNU and keep the rundirs, but as I said the differences will be in modules.fv3 and in the actual model output.
@jedwards4b i tested with changing output_grid = 'cubed_sphere_grid'
but the restart still fails. I'll try to find other possible differences between namelist files. I could also test by using input.nml and module_configure from following tables
fv3_ccpp_gfs_v15p2_prod # 0-48h fcst
fv3_ccpp_gfs_v15p2_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v15p2_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT
fv3_ccpp_gfs_v16beta_prod # 0-48h fcst
fv3_ccpp_gfs_v16beta_coldstart_prod # 0-24h fcst
fv3_ccpp_gfs_v16beta_restart_prod # 24-48h fcst, restart files from coldstart run were copied into INPUT
@climbfuji I tested your input.nml with CIME build model for v15p2 and we have still difference in the restart. So, at least the problem is not related with input.nml. I'll continue to dig but let me know if you have any other idea. The runs are in
Base (48 hours): /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.jan16/run.base2
Restart (24+24 hours): /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96.jan16/run.rest2
I can think of
I need to get this cime setup run by myself. Will try tomorrow.
The initial documentation is in
https://ufs-mrapp.readthedocs.io/en/latest/index.html#
I am still working on but i could find lots of information especially in quick start guide.
@climbfuji I ran the cime restart test with your executable and it passed. This points to a difference in the build, perhaps in the build flags, but I also noticed that you were not using the latest model version: 6a93463
@climbfuji I ran the cime restart test with your executable and it passed. This points to a difference in the build, perhaps in the build flags, but I also noticed that you were not using the latest model version: 6a93463
Yes, the code I had used for the testing didn't include the last PR. But the current PR I have and for which I reran the restart tests does (https://github.com/ufs-community/ufs-weather-model/pull/33).
I built using src/model/tests/compile_cmake.sh and it also passed the restart test - I've been studying the build since and still cannot pinpoint the difference.
If you send me build logs (cmake and make; may have to add VERBOSE=1 to the make calls) then I can take a look. Maybe something comes to my mind wrt which files to look at when I stare at this long enough. Thanks ...
/glade/scratch/jedwards/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.try/bld/atm.bldlog.200121-200946.gz
This problem is fixed. The build flags to libfv3core.a were different.
Yeah! Thanks for figuring this out, I was struggling all day to find time to look at your compile logs.
Can you please elaborate on the fix @jedwards4b? I'm having the same issue with a different build system.
@mcgibbon I found that the noaa build was using the flag -fp-model consistent
but the cime build was using -fp-model source
in the compilation of the fv3core library. Changing the cime compile to match the noaa compile solved the problem.
This test indicates that restarts are not producing bfb results under cime testing.
The file comparisons show: run/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.20200116_085748_nk2uyu.ufsatm.atm.f011.nc.base.cprnc.out: of which 14 had non-zero differences run/ERS_Lh11.C96.GFSv15p2.cheyenne_intel.20200116_085748_nk2uyu.ufsatm.sfc.f011.nc.base.cprnc.out: of which 125 had non-zero differences