ufs-community / ufs-s2s-model

UFS sub-seasonal to seasonal forecast model. This repository was frozen in Oct 2020 and all development was moved to the ufs-weather-model repository.
GNU General Public License v3.0
8 stars 29 forks source link

Regression test for restart reproducibility #34

Closed DeniseWorthen closed 3 years ago

DeniseWorthen commented 4 years ago

A regression test for restart reproducibility is required.

DeniseWorthen commented 4 years ago

@JessicaMeixner-NOAA, @binli2337

I'm going to change the location of the restart files written by MOM6 and CICE5 into the same location used by FV3, the RESTART subdirectory. This can be done in the namelists (ice_in and input.mom6.nml). The same change should also be made for the NEMS mediator but I believe this would require a code change.

I'm also going to move the mediator restarts out of the subdirectory "MEDIATOR_after_2d" and letting mediator restart files reside in the same RESTART directory as the other restarts.

JessicaMeixner-NOAA commented 4 years ago

@DeniseWorthen It would definitely make making baselines easier if all restart files are moved into the same directory. I think some advantages of having input, output or restarts grouped into a folder for just a single component is that you know what file is for/from what. When it's restart vs RESTART or something like that without distinction its not necessarily helpful. But I do remember us trying to separate out input files for the different components so we knew what was for what. For the MOM6 input files there was an FMS issue that didn't actually let you rename the "INPUT" directory to something else for the MOM6 files in our coupled system.

Also is it a small enough code change with big enough advantage to make the code change for the NEMS mediator and not just wait for CMEPS?

I'm on board with removing the subdirectory "MEDIATOR_after_2d" and moving those to the RESTART directory.

junwang-noaa commented 4 years ago

Writing all the restart files from all components into one single RESTART directory is good, so we can have all the restart files in a single place. One concern might be that it could be confusing as different run sequences may result in different time stamps in the restart file names for different components, especially when copying those files into INPUT directory for model restart(or warm start) run.

On Sun, Mar 8, 2020 at 9:03 PM Jessica Meixner notifications@github.com wrote:

@DeniseWorthen https://github.com/DeniseWorthen It would definitely make making baselines easier if all restart files are moved into the same directory. I think some advantages of having input, output or restarts grouped into a folder for just a single component is that you know what file is for/from what. When it's restart vs RESTART or something like that without distinction its not necessarily helpful. But I do remember us trying to separate out input files for the different components so we knew what was for what. For the MOM6 input files there was an FMS issue that didn't actually let you rename the "INPUT" directory to something else for the MOM6 files in our coupled system.

Also is it a small enough code change with big enough advantage to make the code change for the NEMS mediator and not just wait for CMEPS?

I'm on board with removing the subdirectory "MEDIATOR_after_2d" and moving those to the RESTART directory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AI7D6TJ2UN7AZ6Y3HPPZF63RGRE7FA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFHKOY#issuecomment-596276539, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TKZ2MFK7BVKUFFBZKLRGRE7FANCNFSM4LBRAA4Q .

mvertens commented 4 years ago

In my opinion, one import requirement to get restarts to work correctly in a coupled system with multiple components is that the time stamps on all of the component files need to be the same. And it also helps if possible - if the restart file naming convention is the same.

On Sun, Mar 8, 2020 at 7:23 PM junwang-noaa notifications@github.com wrote:

Writing all the restart files from all components into one single RESTART directory is good, so we can have all the restart files in a single place. One concern might be that it could be confusing as different run sequences may result in different time stamps in the restart file names for different components, especially when copying those files into INPUT directory for model restart(or warm start) run.

On Sun, Mar 8, 2020 at 9:03 PM Jessica Meixner notifications@github.com wrote:

@DeniseWorthen https://github.com/DeniseWorthen It would definitely make making baselines easier if all restart files are moved into the same directory. I think some advantages of having input, output or restarts grouped into a folder for just a single component is that you know what file is for/from what. When it's restart vs RESTART or something like that without distinction its not necessarily helpful. But I do remember us trying to separate out input files for the different components so we knew what was for what. For the MOM6 input files there was an FMS issue that didn't actually let you rename the "INPUT" directory to something else for the MOM6 files in our coupled system.

Also is it a small enough code change with big enough advantage to make the code change for the NEMS mediator and not just wait for CMEPS?

I'm on board with removing the subdirectory "MEDIATOR_after_2d" and moving those to the RESTART directory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AI7D6TJ2UN7AZ6Y3HPPZF63RGRE7FA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFHKOY#issuecomment-596276539 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AI7D6TKZ2MFK7BVKUFFBZKLRGRE7FANCNFSM4LBRAA4Q

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AB4XCEYEF3TYGDOENWTHM7TRGRAKTA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFIDGQ#issuecomment-596279706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4XCE5YHQY3TBZH2GJIYFTRGRAKTANCNFSM4LBRAA4Q .

junwang-noaa commented 4 years ago

Yes, I agree it's better all the restart files in coupled component have the same time stamp. But we need first to make sure the internal time clock of each component is updated at the same level as the coupled system earth clock, otherwise, the individual component has its own time stamp on the restart files.

On Sun, Mar 8, 2020 at 9:34 PM mvertens notifications@github.com wrote:

In my opinion, one import requirement to get restarts to work correctly in a coupled system with multiple components is that the time stamps on all of the component files need to be the same. And it also helps if possible - if the restart file naming convention is the same.

On Sun, Mar 8, 2020 at 7:23 PM junwang-noaa notifications@github.com wrote:

Writing all the restart files from all components into one single RESTART directory is good, so we can have all the restart files in a single place. One concern might be that it could be confusing as different run sequences may result in different time stamps in the restart file names for different components, especially when copying those files into INPUT directory for model restart(or warm start) run.

On Sun, Mar 8, 2020 at 9:03 PM Jessica Meixner <notifications@github.com

wrote:

@DeniseWorthen https://github.com/DeniseWorthen It would definitely make making baselines easier if all restart files are moved into the same directory. I think some advantages of having input, output or restarts grouped into a folder for just a single component is that you know what file is for/from what. When it's restart vs RESTART or something like that without distinction its not necessarily helpful. But I do remember us trying to separate out input files for the different components so we knew what was for what. For the MOM6 input files there was an FMS issue that didn't actually let you rename the "INPUT" directory to something else for the MOM6 files in our coupled system.

Also is it a small enough code change with big enough advantage to make the code change for the NEMS mediator and not just wait for CMEPS?

I'm on board with removing the subdirectory "MEDIATOR_after_2d" and moving those to the RESTART directory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AI7D6TJ2UN7AZ6Y3HPPZF63RGRE7FA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFHKOY#issuecomment-596276539

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AI7D6TKZ2MFK7BVKUFFBZKLRGRE7FANCNFSM4LBRAA4Q

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AB4XCEYEF3TYGDOENWTHM7TRGRAKTA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFIDGQ#issuecomment-596279706 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB4XCE5YHQY3TBZH2GJIYFTRGRAKTANCNFSM4LBRAA4Q

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AI7D6TKXEH4M5DGFLB66DSLRGRBTBA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFISVY#issuecomment-596281687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TKYZFZUQGZ2IXVFJJDRGRBTBANCNFSM4LBRAA4Q .

jiandewang commented 4 years ago

Jessica is right, FMS is hard-wired to use "INPUT" directory for MOM6 input and fixed files to be feed in.

jiandewang commented 4 years ago

@mvertens I am afraid it will be hard to require the restart file naming convention is the same as each component is developed by individual group, some of them are hard-wired for restart file name convention.

SMoorthi-emc commented 4 years ago

Just to put my two cents (although I know none cares) is that I have been doing differently with directories "ATM_RESTART", "MED_RESTART", "OCN_RESTART", ""ICE_RESTART" where restarts reside for each components. Similarly, I have "OCN_HISTORY", "ICE_HISTORY" and ""WAV_HISTORY". The ATM history currently follows the traditional GFS style and I don't have WAVE restart yet. These files are all located in the so called "COMROT" directory, not the run directory. My scripts have links to these files; so there is no copying unless it is a restart run  when the correct restarts are copied to the run directory. I am pretty sure nobody cares, but I had to put in my 2 cents. Thanks Moorthi On 3/8/20 9:23 PM, junwang-noaa wrote:

Writing all the restart files from all components into one single RESTART directory is good, so we can have all the restart files in a single place. One concern might be that it could be confusing as different run sequences may result in different time stamps in the restart file names for different components, especially when copying those files into INPUT directory for model restart(or warm start) run.

On Sun, Mar 8, 2020 at 9:03 PM Jessica Meixner notifications@github.com wrote:

@DeniseWorthen https://github.com/DeniseWorthen It would definitely make making baselines easier if all restart files are moved into the same directory. I think some advantages of having input, output or restarts grouped into a folder for just a single component is that you know what file is for/from what. When it's restart vs RESTART or something like that without distinction its not necessarily helpful. But I do remember us trying to separate out input files for the different components so we knew what was for what. For the MOM6 input files there was an FMS issue that didn't actually let you rename the "INPUT" directory to something else for the MOM6 files in our coupled system.

Also is it a small enough code change with big enough advantage to make the code change for the NEMS mediator and not just wait for CMEPS?

I'm on board with removing the subdirectory "MEDIATOR_after_2d" and moving those to the RESTART directory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub

https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AI7D6TJ2UN7AZ6Y3HPPZF63RGRE7FA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFHKOY#issuecomment-596276539, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AI7D6TKZ2MFK7BVKUFFBZKLRGRE7FANCNFSM4LBRAA4Q .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=ALLVRYSCAJ2XL6IIJR727NDRGRAKTA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFIDGQ#issuecomment-596279706, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLVRYTSMB4ZDGWBRUGQA33RGRAKTANCNFSM4LBRAA4Q.

-- Dr. Shrinivas Moorthi Research Meteorologist Modeling and Data Assimilation Branch Environmental Modeling Center / National Centers for Environmental Prediction 5830 University Research Court - (W/NP23), College Park MD 20740 USA Tel:(301)683-3718

SMoorthi-emc commented 4 years ago

Here is an example of my directory structure: top level "/c384_phydb/gfs.20180901/00"  in which I have "ATM_RESTART gfs.t00z.sfcf000.nemsio  ICE_RESTART  MED_RESTART  OCN_RESTART gfs.t00z.atmf000.nemsio  ICE_HISTORY              INPUT OCN_HISTORY" Under "ICE_RESTART" I have "iced.2018-09-01-10800.nc iced.2018-09-01-21600.nc  iced.2018-09-01-32400.nc" Under "OCN_RESTART" I have "MOM.res_1.nc ocn.mom6.r.2018-09-01-06-00-00_1.nc MOM.res_2.nc ocn.mom6.r.2018-09-01-06-00-00_2.nc MOM.res_3.nc ocn.mom6.r.2018-09-01-06-00-00_3.nc MOM.res.nc ocn.mom6.r.2018-09-01-06-00-00.nc ocn.mom6.r.2018-09-01-03-00-00_1.nc ocn.mom6.r.2018-09-01-09-00-00_1.nc ocn.mom6.r.2018-09-01-03-00-00_2.nc ocn.mom6.r.2018-09-01-09-00-00_2.nc ocn.mom6.r.2018-09-01-03-00-00_3.nc ocn.mom6.r.2018-09-01-09-00-00_3.nc ocn.mom6.r.2018-09-01-03-00-00.nc ocn.mom6.r.2018-09-01-09-00-00.nc" Under "ATM_RESTART" I have "20180901.030000.coupler.res 20180901.060000.fv_tracer.res.tile3.nc 20180901.030000.fv_core.res.nc 20180901.060000.fv_tracer.res.tile4.nc 20180901.030000.fv_core.res.tile1.nc 20180901.060000.fv_tracer.res.tile5.nc 20180901.030000.fv_core.res.tile2.nc 20180901.060000.fv_tracer.res.tile6.nc 20180901.030000.fv_core.res.tile3.nc 20180901.060000.phy_data.tile1.nc 20180901.030000.fv_core.res.tile4.nc 20180901.060000.phy_data.tile2.nc 20180901.030000.fv_core.res.tile5.nc 20180901.060000.phy_data.tile3.nc 20180901.030000.fv_core.res.tile6.nc 20180901.060000.phy_data.tile4.nc 20180901.030000.fv_srf_wnd.res.tile1.nc 20180901.060000.phy_data.tile5.nc 20180901.030000.fv_srf_wnd.res.tile2.nc 20180901.060000.phy_data.tile6.nc 20180901.030000.fv_srf_wnd.res.tile3.nc 20180901.060000.sfc_data.tile1.nc 20180901.030000.fv_srf_wnd.res.tile4.nc 20180901.060000.sfc_data.tile2.nc 20180901.030000.fv_srf_wnd.res.tile5.nc 20180901.060000.sfc_data.tile3.nc 20180901.030000.fv_srf_wnd.res.tile6.nc 20180901.060000.sfc_data.tile4.nc 20180901.030000.fv_tracer.res.tile1.nc 20180901.060000.sfc_data.tile5.nc 20180901.030000.fv_tracer.res.tile2.nc 20180901.060000.sfc_data.tile6.nc 20180901.030000.fv_tracer.res.tile3.nc 20180901.090000.coupler.res 20180901.030000.fv_tracer.res.tile4.nc 20180901.090000.fv_core.res.nc 20180901.030000.fv_tracer.res.tile5.nc 20180901.090000.fv_core.res.tile1.nc 20180901.030000.fv_tracer.res.tile6.nc 20180901.090000.fv_core.res.tile2.nc 20180901.030000.phy_data.tile1.nc 20180901.090000.fv_core.res.tile3.nc 20180901.030000.phy_data.tile2.nc 20180901.090000.fv_core.res.tile4.nc 20180901.030000.phy_data.tile3.nc 20180901.090000.fv_core.res.tile5.nc 20180901.030000.phy_data.tile4.nc 20180901.090000.fv_core.res.tile6.nc 20180901.030000.phy_data.tile5.nc 20180901.090000.fv_srf_wnd.res.tile1.nc 20180901.030000.phy_data.tile6.nc 20180901.090000.fv_srf_wnd.res.tile2.nc 20180901.030000.sfc_data.tile1.nc 20180901.090000.fv_srf_wnd.res.tile3.nc 20180901.030000.sfc_data.tile2.nc 20180901.090000.fv_srf_wnd.res.tile4.nc 20180901.030000.sfc_data.tile3.nc 20180901.090000.fv_srf_wnd.res.tile5.nc 20180901.030000.sfc_data.tile4.nc 20180901.090000.fv_srf_wnd.res.tile6.nc 20180901.030000.sfc_data.tile5.nc 20180901.090000.fv_tracer.res.tile1.nc 20180901.030000.sfc_data.tile6.nc 20180901.090000.fv_tracer.res.tile2.nc 20180901.060000.coupler.res 20180901.090000.fv_tracer.res.tile3.nc 20180901.060000.fv_core.res.nc 20180901.090000.fv_tracer.res.tile4.nc 20180901.060000.fv_core.res.tile1.nc 20180901.090000.fv_tracer.res.tile5.nc 20180901.060000.fv_core.res.tile2.nc 20180901.090000.fv_tracer.res.tile6.nc 20180901.060000.fv_core.res.tile3.nc 20180901.090000.phy_data.tile1.nc 20180901.060000.fv_core.res.tile4.nc 20180901.090000.phy_data.tile2.nc 20180901.060000.fv_core.res.tile5.nc 20180901.090000.phy_data.tile3.nc" Under "MED_RESTART" I have "20180901-030000_mediator_FBaccumAtm_restart.tile1.nc 20180901-060000_mediator_FBaccumAtm_restart.tile2.nc 20180901-090000_mediator_FBaccumAtm_restart.tile2.nc 20180901-030000_mediator_FBaccumAtm_restart.tile2.nc 20180901-060000_mediator_FBaccumAtm_restart.tile3.nc 20180901-090000_mediator_FBaccumAtm_restart.tile3.nc 20180901-030000_mediator_FBaccumAtm_restart.tile3.nc 20180901-060000_mediator_FBaccumAtm_restart.tile4.nc 20180901-090000_mediator_FBaccumAtm_restart.tile4.nc 20180901-030000_mediator_FBaccumAtm_restart.tile4.nc 20180901-060000_mediator_FBaccumAtm_restart.tile5.nc 20180901-090000_mediator_FBaccumAtm_restart.tile5.nc 20180901-030000_mediator_FBaccumAtm_restart.tile5.nc 20180901-060000_mediator_FBaccumAtm_restart.tile6.nc 20180901-090000_mediator_FBaccumAtm_restart.tile6.nc 20180901-030000_mediator_FBaccumAtm_restart.tile6.nc 20180901-060000_mediator_FBaccumHyd_restart.nc 20180901-090000_mediator_FBaccumHyd_restart.nc 20180901-030000_mediator_FBaccumHyd_restart.nc 20180901-060000_mediator_FBaccumIce_restart.nc 20180901-090000_mediator_FBaccumIce_restart.nc 20180901-030000_mediator_FBaccumIce_restart.nc 20180901-060000_mediator_FBaccumLnd_restart.nc 20180901-090000_mediator_FBaccumLnd_restart.nc 20180901-030000_mediator_FBaccumLnd_restart.nc 20180901-060000_mediator_FBaccumOcn_restart.nc 20180901-090000_mediator_FBaccumOcn_restart.nc 20180901-030000_mediator_FBaccumOcn_restart.nc 20180901-060000_mediator_FBAtm_a_restart.tile1.nc 20180901-090000_mediator_FBAtm_a_restart.tile1.nc 20180901-030000_mediator_FBAtm_a_restart.tile1.nc 20180901-060000_mediator_FBAtm_a_restart.tile2.nc 20180901-090000_mediator_FBAtm_a_restart.tile2.nc 20180901-030000_mediator_FBAtm_a_restart.tile2.nc 20180901-060000_mediator_FBAtm_a_restart.tile3.nc 20180901-090000_mediator_FBAtm_a_restart.tile3.nc 20180901-030000_mediator_FBAtm_a_restart.tile3.nc 20180901-060000_mediator_FBAtm_a_restart.tile4.nc 20180901-090000_mediator_FBAtm_a_restart.tile4.nc 20180901-030000_mediator_FBAtm_a_restart.tile4.nc 20180901-060000_mediator_FBAtm_a_restart.tile5.nc 20180901-090000_mediator_FBAtm_a_restart.tile5.nc 20180901-030000_mediator_FBAtm_a_restart.tile5.nc 20180901-060000_mediator_FBAtm_a_restart.tile6.nc 20180901-090000_mediator_FBAtm_a_restart.tile6.nc 20180901-030000_mediator_FBAtm_a_restart.tile6.nc 20180901-060000_mediator_FBAtmOcn_o_restart.nc 20180901-090000_mediator_FBAtmOcn_o_restart.nc 20180901-030000_mediator_FBAtmOcn_o_restart.nc 20180901-060000_mediator_FBHyd_h_restart.nc 20180901-090000_mediator_FBHyd_h_restart.nc 20180901-030000_mediator_FBHyd_h_restart.nc 20180901-060000_mediator_FBIce_i_restart.nc 20180901-090000_mediator_FBIce_i_restart.nc 20180901-030000_mediator_FBIce_i_restart.nc 20180901-060000_mediator_FBLnd_l_restart.nc 20180901-090000_mediator_FBLnd_l_restart.nc 20180901-030000_mediator_FBLnd_l_restart.nc 20180901-060000_mediator_FBOcn_o_restart.nc 20180901-090000_mediator_FBOcn_o_restart.nc 20180901-030000_mediator_FBOcn_o_restart.nc 20180901-060000_mediator_scalars_restart.txt 20180901-090000_mediator_scalars_restart.txt 20180901-030000_mediator_scalars_restart.txt 20180901-090000_mediator_FBaccumAtmOcn_restart.nc 20180901-060000_mediator_FBaccumAtm_restart.tile1.nc 20180901-090000_mediator_FBaccumAtm_restart.tile1.nc" In thi particular run I was writing 3 hourly restarts (to debug reproducibility).  As one can see, putting them all in one restart directory  will make it terrible. Moorthi On 3/9/20 6:52 AM, Shrinivas Moorthi wrote:

Just to put my two cents (although I know none cares) is that I have been doing differently with directories "ATM_RESTART", "MED_RESTART", "OCN_RESTART", ""ICE_RESTART" where restarts reside for each components. Similarly, I have "OCN_HISTORY", "ICE_HISTORY" and ""WAV_HISTORY". The ATM history currently follows the traditional GFS style and I don't have WAVE restart yet. These files are all located in the so called "COMROT" directory, not the run directory. My scripts have links to these files; so there is no copying unless it is a restart run  when the correct restarts are copied to the run directory. I am pretty sure nobody cares, but I had to put in my 2 cents. Thanks Moorthi On 3/8/20 9:23 PM, junwang-noaa wrote:

Writing all the restart files from all components into one single RESTART directory is good, so we can have all the restart files in a single place. One concern might be that it could be confusing as different run sequences may result in different time stamps in the restart file names for different components, especially when copying those files into INPUT directory for model restart(or warm start) run.

On Sun, Mar 8, 2020 at 9:03 PM Jessica Meixner notifications@github.com wrote:

@DeniseWorthen https://github.com/DeniseWorthen It would definitely make making baselines easier if all restart files are moved into the same directory. I think some advantages of having input, output or restarts grouped into a folder for just a single component is that you know what file is for/from what. When it's restart vs RESTART or something like that without distinction its not necessarily helpful. But I do remember us trying to separate out input files for the different components so we knew what was for what. For the MOM6 input files there was an FMS issue that didn't actually let you rename the "INPUT" directory to something else for the MOM6 files in our coupled system.

Also is it a small enough code change with big enough advantage to make the code change for the NEMS mediator and not just wait for CMEPS?

I'm on board with removing the subdirectory "MEDIATOR_after_2d" and moving those to the RESTART directory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub

https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=AI7D6TJ2UN7AZ6Y3HPPZF63RGRE7FA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFHKOY#issuecomment-596276539, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AI7D6TKZ2MFK7BVKUFFBZKLRGRE7FANCNFSM4LBRAA4Q .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34?email_source=notifications&email_token=ALLVRYSCAJ2XL6IIJR727NDRGRAKTA5CNFSM4LBRAA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFIDGQ#issuecomment-596279706, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLVRYTSMB4ZDGWBRUGQA33RGRAKTANCNFSM4LBRAA4Q.

-- Dr. Shrinivas Moorthi Research Meteorologist Modeling and Data Assimilation Branch Environmental Modeling Center / National Centers for Environmental Prediction 5830 University Research Court - (W/NP23), College Park MD 20740 USA Tel:(301)683-3718

JessicaMeixner-NOAA commented 4 years ago

Everyone plays a vital role in making this successful. Everyone's opinion is valued and is being solicited via issues like these and we appreciate your constructive feedback. For your workflow suggestions, I would recommend making a PR to the https://github.com/noaa-emc/global-workflow to the feature/coupled-crow branch so we can all share in the benefit of your hard work.

DeniseWorthen commented 4 years ago

Draft PR #52

DeniseWorthen commented 4 years ago

I would like to ask how we want to implement the restart regression test. In my mind, there are two types of restarts: checkpoint restarts and wallclock restarts.

Checkpoint restarts are restarts written at set intervals during a run in case you need to go back and fix or re-run something. Wallclock restarts are written at the end of a run so you can continue the run. Restarts must be reproducible for both cases.

I can think of three ways to set up the restart regression test. Here, CP is a 'checkpoint restart' and WC is a 'wallclock restart':

Screen Shot 2020-05-21 at 7 45 04 AM
 Restart test is that restarts @T2 are identical for both runs
 Requires two additional test cases.
Screen Shot 2020-05-21 at 7 46 20 AM
 Restart test is that restarts @T2 are identical for first and third runs
 Requires three additional test cases.

C) Both A and B; In this case you would also compare that the CP-T1 restarts are identical to the WC-T1 restarts. This might be over-kill, since either A or B alone would alert you that there was a restart problem and at that point a test that CP-T1 = WC-T1 could be performed.

I would lean towards using A because it is fewer tests, but it has the complication that the CP-T1 restarts have a different name (they are time-stamped) so a re-naming function would need to be added to the rt.sh system.

junwang-noaa commented 4 years ago

Unless writing checking points have some impact on the runs, I don't see major differences between A) and B). The restart files written out in both A) CP and B) WC should be the same at T1. in ufs-weather-model, we do A) for restart tests.

On Thu, May 21, 2020 at 7:53 AM Denise Worthen notifications@github.com wrote:

I would like to ask how we want to implement the restart regression test. In my mind, there are two types of restarts: checkpoint restarts and wallclock restarts.

Checkpoint restarts are restarts written at set intervals during a run in case you need to go back and fix or re-run something. Wallclock restarts are written at the end of a run so you can continue the run. Restarts must be reproducible for both cases.

I can think of three ways to set up the restart regression test. Here, CP is a 'checkpoint restart' and 'WC' is a 'wallclock restart':

[image: Screen Shot 2020-05-21 at 7 45 04 AM] https://user-images.githubusercontent.com/40498404/82556120-1bcabf00-9b37-11ea-9c6d-334c54c8da4f.png

Restart test is that restarts @T2 are identical for both runs Requires two additional test cases.

[image: Screen Shot 2020-05-21 at 7 46 20 AM] https://user-images.githubusercontent.com/40498404/82556152-2f762580-9b37-11ea-9c0f-23cfb2064217.png

Restart test is that restarts @T2 are identical for first and third runs Requires three additional test cases.

C) Both A and B; In this case you would also compare that the CP-T1 restarts are identical to the WC-T1 restarts. This might be over-kill, since either A or B alone would alert you that there was a restart problem and at that point a test that CP-T1 = WC-T1 could be performed.

I would lean towards using A because it is fewer tests, but it has the complication that the CP-T1 restarts have a different name (they are time-stamped) so a re-naming function would need to be added to the rt.sh system.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34#issuecomment-632044763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TNSSVYJ2CNB2CCEKX3RSUIZ3ANCNFSM4LBRAA4Q .

DeniseWorthen commented 4 years ago

Thanks. So for A), does rt.sh have the ability to rename the time-stamped restart files at CP-T1?

From the initial T0->T2 run, in RESTART we get a restart file named yearmonday.secs.fv_core.res.tile3.nc. To start from this restart, it needs to be copied to the INPUT directory with the name changed to fv_core.res.tile3.nc, correct?

DeniseWorthen commented 4 years ago

The restart regression test has been created. However, after extensive testing it currently fails due to the changed oro data in 20200515 baseline. The current code is restart reproducible when using the oro data from the baseline created on 20200504.

Using the oro data in the current baseline, the CMEPS mediator history file difference between the continuous and the restart run shows that 7 fields from the ATM are not reproducing on restart: inst_height_lowest, inst_pres_height_lowest, inst_spec_humid_height_lowest, inst_temp_height_lowest, mean_fprec_rate, mean_net_lw_flx, mean_prec_rate on tile3. All other tiles reproduce.

These differences are clearly related to areas which are lakes. The following figure shows the difference in mean_net_lw_flx on restart:

tile3

junwang-noaa commented 4 years ago

So this is C384 test? Oro data is read in the restart run, it should be reproducible. May I take a look at your run directory?

On Mon, Jun 1, 2020 at 9:04 AM Denise Worthen notifications@github.com wrote:

The restart regression test has been created. However, after extensive testing it currently fails due to the changed oro data in 20200515 baseline. The current code is restart reproducible when using the oro data from the baseline created on 20200504.

Using the oro data in the current baseline, the CMEPS mediator history file difference between the continuous and the restart run shows that 7 fields from the ATM are not reproducing on restart: inst_height_lowest, inst_pres_height_lowest, inst_spec_humid_height_lowest, inst_temp_height_lowest, mean_fprec_rate, mean_net_lw_flx, mean_prec_rate on tile3. All other tiles reproduce.

These differences are clearly related to areas which are lakes. The following figure shows the difference in mean_net_lw_flx on restart:

[image: tile3] https://user-images.githubusercontent.com/40498404/83411344-0cc0f800-a3e6-11ea-8a2d-017028ac8315.jpg

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34#issuecomment-636849261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TL7ZIZXCRLJ72NQQ2DRUOROVANCNFSM4LBRAA4Q .

DeniseWorthen commented 4 years ago

This is C384.

I have a sandbox where I ended up re-creating the test I had been doing on Cheyenne. On Cheyenne the current code is reproducible, but it is using input data we brought over earlier this year. The Cheyenne test involves a 2 hour run and then a 1 hour run from the intermediate restarts. My sandbox is here: /scratch1/NCEPDEV/stmp2/Denise.Worthen/RTcmeps.

The runs which use current oro are in subdir afatm; hour2 (the 2 hour run) and restr (the restart run). The fv3_cap export files in that directory are the differences between the hour2 and restr runs.

The runs which use the oro data from 0504 are in subdir afatm_orioro. In that case I don't have the fv3_cap exports, I am differencing the mediator history files at 2013-04-01-05400.nc.

DeniseWorthen commented 4 years ago

Jun--I am ready to open the PR for committing CMEPS to S2S. I have the baselines made on Hera but I need to run the tests and post the logs. Since Hera will be down tomorrow, I'd like to do this today. But I don't know what to do about the restart test failure.

junwang-noaa commented 4 years ago

Denise, if possible, can we change the restart test to C96 to keep the restart reproducibility capability? We can change it to C384 with updated oro files when we figure out what goes wrong.

On Mon, Jun 1, 2020 at 10:24 AM Denise Worthen notifications@github.com wrote:

Jun--I am ready to open the PR for committing CMEPS to S2S. I have the baselines made on Hera but I need to run the tests and post the logs. Since Hera will be down tomorrow, I'd like to do this today. But I don't know what to do about the restart test failure.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34#issuecomment-636890409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TIBLJELZTHLZSSNBWDRUO23DANCNFSM4LBRAA4Q .

DeniseWorthen commented 4 years ago

I can do this; I think it might be easiest to run the restart from the existing 2d restarts; I will need to add a 3day C96 baseline but that should run quickly. We can leave the existing tests that I made in place and drop the extra c96 tests when we're ready.

junwang-noaa commented 4 years ago

Yes, that will work.

On Mon, Jun 1, 2020 at 10:55 AM Denise Worthen notifications@github.com wrote:

I can do this; I think it might be easiest to run the restart from the existing 2d restarts; I will need to add a 3day C96 baseline but that should run quickly. We can leave the existing tests that I made in place and drop the extra c96 tests when we're ready.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-s2s-model/issues/34#issuecomment-636905610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TPFOCQXU2HZLTMDUOLRUO6MNANCNFSM4LBRAA4Q .

DeniseWorthen commented 4 years ago

I've created a new c96 restart test and it passed the test. I will commit the additional files and start the tests running on Hera.

DeniseWorthen commented 4 years ago

Still to do on this issue is to resolve problem w/ the C384 oro data and implement the restart test using the 1d benchmark test. Minsuk is also working on adding dependency testing for the ufs-s2s-model rt.sh in UFS-S2S issue #103 . Those changes will make the restart test easier to implement and test.

DeniseWorthen commented 3 years ago

Restart issues for the coupled model will be continued in ufs-weather issue 227