Open natalie-perlin opened 1 month ago
@natalie-perlin -
I'll move this work to On-Hold until a WE2E test has been added so that we can properly test this new functionality.
NB: @MichaelLueken - this PR requires an option "RRFS" to be allowed in UFS_UTILS. Current develop branch of ufs-community/UFS_UTILS does have the "RRFS" option enabled, but the version checked out by the SRW does not (a correction needs to be to allow it).
How should we proceed with this requirement?
An updated tag could be used for the UFS_UTILS that had this option implemented
@natalie-perlin -
I'll check if updating the version of UFS_UTILS
will work in the SRW App. The commit in UFS_UTILS
following what is currently in the SRW App's External.cfg
file causes the weather model to fail (the weather model is expecting sheleg
, while chgres_cube
is generating sheleg_ice
and sheleg_land
, leading to the previously mentioned failure).
I'll go ahead and try updating the UFS_UTILS
version to the latest version and see if it works. If it does, then we can move forward with this update. However, if it continues to fail, I will need to open an issue in the UFS_UTILS
repository to let them know about the continued failures and see what can be done.
@natalie-perlin -
What version of UFS_UTILS
contains the necessary fix so that we can exercise the use of RRFS ICs/LBCs in the SRW App? I can try to update to that version and see what issues appear.
It looks like UFS_UTILS
PR #902 includes the necessary changes for chgres_cube
to work with RRFS. I'll try a later version of the UFS_UTILS repository, then this one, to see if either will work.
It looks like
UFS_UTILS
PR #902 includes the necessary changes forchgres_cube
to work with RRFS. I'll try a later version of the UFS_UTILS repository, then this one, to see if either will work.
Yes - thank you!! I was having troubles finding exact time/version when this change was implemented!. The changes required to allow for RRFS option had to be done in two locations in UFS_UTILS repository in ./sorc/chgres_cube.fd/program_setup.F90: line 57 and lines 321-322. It looks like the PR you mentioned address that: https://github.com/ufs-community/UFS_UTILS/pull/902/files#diff-6b6d24e7712144952ef83ca8f5e9d56e164fdcab1f7faab27812e91bfd483ba2
@natalie-perlin -
Using the version of UFS_UTILS associated with PR #902 is causing a failure in the fundamental tests:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 9.31
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 8.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 15.17
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024060 DEAD 5.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240603165 COMPLETE 23.23
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024060316510 COMPLETE 20.14
----------------------------------------------------------------------------------------------------
Total DEAD 81.78
The grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
WE2E test failed in run_fcst_mem000
with the following error message:
FATAL from PE 0: NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice
I'll try backing my way through the commits in the UFS_UTILS repository to see which entry is causing issues with tiice
.
@MichaelLueken - thank you for testing!! Let me look into these errors - look like a data problem. I might need to stage an additional directory in the EPIC space with data that I though was not needed... will get back to you!
@MichaelLueken - What is the location of your test? I was not able to reproduce this error.
However, I'm also replacing the explicit format statement for RRFS in the config.yaml file by the format given in the code, and doing some more changes for this PR, but not yet in GitHub.
@natalie-perlin -
I had been working on Hera, so I had to prepare this work on another machine. On Gaea, using the 1dac855 hash from UFS_UTILS (PR #902), the fundamental WE2E test suite is failing with the same issue as seen on Hera:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 19.99
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 12.94
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 27.55
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024060 DEAD 7.53
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240604105 COMPLETE 33.85
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024060410591 COMPLETE 46.89
----------------------------------------------------------------------------------------------------
Total DEAD 148.75
The grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
WE2E test is failing in run_fcst_mem000
with the following error message:
FATAL from PE 3: NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice
The test can be fond on Gaea - /gpfs/f5/epic/scratch/Michael.Lueken/ufs-srweather-app/expt_dirs/grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
I was also able to find that the changes associated with PR #873 are what is causing these issues.
@natalie-perlin -
I had been working on Hera, so I had to prepare this work on another machine. On Gaea, using the 1dac855 hash from UFS_UTILS (PR #902), the fundamental WE2E test suite is failing with the same issue as seen on Hera:
---------------------------------------------------------------------------------------------------- Experiment name | Status | Core hours used ---------------------------------------------------------------------------------------------------- grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 19.99 grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 12.94 grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 27.55 grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024060 DEAD 7.53 grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240604105 COMPLETE 33.85 grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024060410591 COMPLETE 46.89 ---------------------------------------------------------------------------------------------------- Total DEAD 148.75
The
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
WE2E test is failing inrun_fcst_mem000
with the following error message:
FATAL from PE 3: NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice
The test can be fond on Gaea - /gpfs/f5/epic/scratch/Michael.Lueken/ufs-srweather-app/expt_dirs/grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
I was also able to find that the changes associated with PR #873 are what is causing these issues.
Thank you, Michael, for testing The RRFS data is yet to be staged on Gaea
@natalie-perlin -
Have you tried running the fundamental WE2E test suite with the modifications you have made to use RRFS external model files to make ICs and LBCs? If you try running the fundamental test suite using the updated UFS_UTILS
hash, you should encounter the failure that I have noted above.
I haven't tried running your rrfs_ics_lbcs
branch or an experiment using RRFS external model files with an updated UFS_UTILS
hash. All of my testing was off of my old feature/hash_update
branch. The failure encountered has been in the fundamental test suite.
It would be good to add the RRFS file entry to data_locations.yaml.
Yes, I'm doing this as well as a part of this PR - still work in progress, and these changes are not yet in my GitHub repository. There are some other issues that appear after I attempted to introduce many changes at once. So I'm stepping back to the point there it was fully working (including my own changes to the UFS_UTILS), and adding one-by-one.
@MichaelLueken - some changes pushed to the branch. However, the problem with the forecast phase still remains. It looks like surface data sfc_data.nc file needs to be of different format when the updated chgres_cube is used. I placed a comment to the UFS_UTILS asking for some suggestions: https://github.com/ufs-community/UFS_UTILS/issues/850#issuecomment-2154293614
A case study using RRFS ICS/LBCS has been successfully tested, and fundamental tests (not containing the new test) pass successfully on AWS as well. Logs attached.
log.run_WE2E_tests.txt WE2E_tests_20240613160048.yaml.txt
A test added config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml
which includes plotting tasks as well. It has three ensemble members, and I'm not sure if the plotting tasks could be done to individual members, or how to handle task dependency in case of ensemble. So plotting tasks are not launched, as workflow dependency needs some adjustment, unless we want to remove a plotting task.
rootostat prompt looks like following and does not advance further:
(srw_app) [Natalie.Perlin@NOAA-AWS:/lustre/SRW/expt_dirs/grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta]$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
/apps/rocoto/1.3.3/lib/workflowmgr/launchserver.rb:40: warning: Insecure world writable dir /lustre in PATH, mode 040777
/apps/rocoto/1.3.3/lib/workflowmgr/launchserver.rb:40: warning: Insecure world writable dir /lustre in PATH, mode 040777
CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION
================================================================================================================================
202406051700 make_grid 403 SUCCEEDED 0 1 14.0
202406051700 make_orog 406 SUCCEEDED 0 1 38.0
202406051700 make_sfc_climo 407 SUCCEEDED 0 1 85.0
202406051700 get_extrn_ics 404 SUCCEEDED 0 1 8.0
202406051700 get_extrn_lbcs 405 SUCCEEDED 0 1 8.0
202406051700 make_ics_mem001 408 SUCCEEDED 0 1 434.0
202406051700 make_lbcs_mem001 409 SUCCEEDED 0 1 471.0
202406051700 run_fcst_mem001 414 SUCCEEDED 0 1 1046.0
202406051700 make_ics_mem002 410 SUCCEEDED 0 1 434.0
202406051700 make_lbcs_mem002 411 SUCCEEDED 0 1 568.0
202406051700 run_fcst_mem002 415 SUCCEEDED 0 1 1050.0
202406051700 make_ics_mem003 412 SUCCEEDED 0 1 427.0
202406051700 make_lbcs_mem003 413 SUCCEEDED 0 1 568.0
202406051700 run_fcst_mem003 416 SUCCEEDED 0 1 481.0
202406051700 run_post_mem001_f000 417 SUCCEEDED 0 1 17.0
202406051700 run_post_mem001_f001 418 SUCCEEDED 0 1 17.0
202406051700 run_post_mem001_f002 424 SUCCEEDED 0 1 10.0
202406051700 run_post_mem001_f003 425 SUCCEEDED 0 1 9.0
202406051700 run_post_mem002_f000 420 SUCCEEDED 0 1 12.0
202406051700 run_post_mem002_f001 426 SUCCEEDED 0 1 9.0
202406051700 run_post_mem002_f002 427 SUCCEEDED 0 1 10.0
202406051700 run_post_mem002_f003 428 SUCCEEDED 0 1 12.0
202406051700 run_post_mem003_f000 419 SUCCEEDED 0 1 16.0
202406051700 run_post_mem003_f001 421 SUCCEEDED 0 1 13.0
202406051700 run_post_mem003_f002 422 SUCCEEDED 0 1 13.0
202406051700 run_post_mem003_f003 423 SUCCEEDED 0 1 17.0
202406051700 plot_allvars - - - - -
@natalie-perlin -
I was able to get the new grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta
WE2E test to run by setting:
platform:
EXTRN_MDL_DATA_STORES: aws
in the configuration file and removing:
USE_USER_STAGED_EXTRN_FILES: true
for both task_get_extrn_ics
and task_get_extrn_lbcs
.
I also see the same behavior as you with respect to the plot_allvars
task:
CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION
================================================================================================================================
202406051700 make_grid 61723209 SUCCEEDED 0 1 18.0
202406051700 make_orog 61723628 SUCCEEDED 0 1 27.0
202406051700 make_sfc_climo 61723639 SUCCEEDED 0 1 42.0
202406051700 get_extrn_ics 61723210 SUCCEEDED 0 1 768.0
202406051700 get_extrn_lbcs 61723211 SUCCEEDED 0 1 1669.0
202406051700 make_ics_mem001 61723748 SUCCEEDED 0 1 48.0
202406051700 make_lbcs_mem001 61725500 SUCCEEDED 0 1 113.0
202406051700 run_fcst_mem001 61725735 SUCCEEDED 0 1 554.0
202406051700 make_ics_mem002 61723746 SUCCEEDED 0 1 50.0
202406051700 make_lbcs_mem002 61725502 SUCCEEDED 0 1 101.0
202406051700 run_fcst_mem002 61725738 SUCCEEDED 0 1 549.0
202406051700 make_ics_mem003 61723747 SUCCEEDED 0 1 47.0
202406051700 make_lbcs_mem003 61725501 SUCCEEDED 0 1 108.0
202406051700 run_fcst_mem003 61725736 SUCCEEDED 0 1 555.0
202406051700 run_post_mem001_f000 61726433 SUCCEEDED 0 1 16.0
202406051700 run_post_mem001_f001 61726595 SUCCEEDED 0 1 15.0
202406051700 run_post_mem001_f002 61726826 SUCCEEDED 0 1 17.0
202406051700 run_post_mem001_f003 61726824 SUCCEEDED 0 1 16.0
202406051700 run_post_mem002_f000 61726525 SUCCEEDED 0 1 14.0
202406051700 run_post_mem002_f001 61726607 SUCCEEDED 0 1 21.0
202406051700 run_post_mem002_f002 61726829 SUCCEEDED 0 1 14.0
202406051700 run_post_mem002_f003 61726830 SUCCEEDED 0 1 16.0
202406051700 run_post_mem003_f000 61726434 SUCCEEDED 0 1 16.0
202406051700 run_post_mem003_f001 61726594 SUCCEEDED 0 1 13.0
202406051700 run_post_mem003_f002 61726823 SUCCEEDED 0 1 17.0
202406051700 run_post_mem003_f003 61726825 SUCCEEDED 0 1 16.0
202406051700 plot_allvars - - - - -
@natalie-perlin -
I was able to get the new
grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta
WE2E test to run by setting:platform: EXTRN_MDL_DATA_STORES: aws
in the configuration file and removing:
USE_USER_STAGED_EXTRN_FILES: true
for both
task_get_extrn_ics
andtask_get_extrn_lbcs
.I also see the same behavior as you with respect to the
plot_allvars
task: ...
@MichaelLueken - yes, that works on the systems that have network access, but would not work for Hera, for example.
@natalie-perlin -
Looking in parm/wflow/plot.yaml
, I think I see why the experiment is not kicking off the plot_allvars
task. In order for the task to start, the run_post_mem000_f000
task will have needed to have completed:
dependency:
or_do_post: &post_files_exist
and_run_post: # If post was meant to run, wait on the whole post metatask
taskvalid:
attrs:
task: run_post_mem000_f000
metataskdep:
attrs:
metatask: run_ens_post
I was able to make plot_allvars
run by changing run_post_mem000_f000
to run_post_mem001_f000
. However, the job fails because it was looking for /scratch2/NAGAPE/epic/Michael.Lueken/expt_dirs/grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta/2024060517/mem#mem#/postprd/srw.t17z.prslev.f000.rrfs_conuscompact_25km.grib2
, rather than /scratch2/NAGAPE/epic/Michael.Lueken/expt_dirs/grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta/2024060517/mem001/postprd/srw.t17z.prslev.f000.rrfs_conuscompact_25km.grib2
.
The current implementation of plot_allvars
only works for deterministic runs (no ensembles). The parm/wflow/plot.yaml
file will need to be updated to allow for the capability to plot ensemble forecasts.
@MichaelLueken @christinaholtNOAA - The plotting configuration file has been updated in ./parm/wflow/plot.yaml, but the plotting tasks still do not show up in the rocotostat output. Any suggestions to what is needed to have it visible?
This is what is set up in config.yaml for rocoto tasks:
rocoto:
tasks:
taskgroups: '{{ ["parm/wflow/prep.yaml", "parm/wflow/coldstart.yaml", "parm/wflow/post.yaml", "parm/wflow/plot.yaml"]|include }}'
But the rocotostat shows the following:
(srw_app) [Natalie.Perlin@Hera:/scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/RRFS_test_CONUS_25km]$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION
================================================================================================================================
202406051700 make_grid 62004343 SUCCEEDED 0 1 73.0
202406051700 make_orog 62004848 SUCCEEDED 0 1 51.0
202406051700 make_sfc_climo 62005705 SUCCEEDED 0 1 57.0
202406051700 get_extrn_ics 62004344 SUCCEEDED 0 1 15.0
202406051700 get_extrn_lbcs 62004345 SUCCEEDED 0 1 23.0
202406051700 make_ics_mem001 62005915 SUCCEEDED 0 1 61.0
202406051700 make_lbcs_mem001 62005913 SUCCEEDED 0 1 125.0
202406051700 run_fcst_mem001 62006240 SUCCEEDED 0 1 626.0
202406051700 make_ics_mem002 62005914 SUCCEEDED 0 1 61.0
202406051700 make_lbcs_mem002 62005912 SUCCEEDED 0 1 123.0
202406051700 run_fcst_mem002 62006239 SUCCEEDED 0 1 630.0
202406051700 run_post_mem001_f000 62008026 SUCCEEDED 0 1 32.0
202406051700 run_post_mem001_f001 62008025 SUCCEEDED 0 1 34.0
202406051700 run_post_mem001_f002 62008027 SUCCEEDED 0 1 39.0
202406051700 run_post_mem001_f003 62008028 SUCCEEDED 0 1 38.0
202406051700 run_post_mem002_f000 62008029 SUCCEEDED 0 1 41.0
202406051700 run_post_mem002_f001 62008030 SUCCEEDED 0 1 41.0
202406051700 run_post_mem002_f002 62008031 SUCCEEDED 0 1 37.0
202406051700 run_post_mem002_f003 62008032 SUCCEEDED 0 1 39.0
no plotting tasks show up.
@natalie-perlin - It appears there is a typo in your parm/wflow/plot.yaml
file. Line 32 should be metatask_plot_allvars_mem#mem#_all_fhrs:
. Fixing this will add the plot_allvars task to your rrfs experiment. See code snippet below:
CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION
================================================================================================================================
202406051700 make_grid druby://10.184.10.61:36013 SUBMITTING - 0 0.0
202406051700 make_orog - - - - -
202406051700 make_sfc_climo - - - - -
202406051700 get_extrn_ics druby://10.184.10.61:36013 SUBMITTING - 0 0.0
202406051700 get_extrn_lbcs druby://10.184.10.61:36013 SUBMITTING - 0 0.0
202406051700 make_ics_mem001 - - - - -
202406051700 make_lbcs_mem001 - - - - -
202406051700 run_fcst_mem001 - - - - -
202406051700 make_ics_mem002 - - - - -
202406051700 make_lbcs_mem002 - - - - -
202406051700 run_fcst_mem002 - - - - -
202406051700 run_post_mem001_f000 - - - - -
202406051700 run_post_mem001_f001 - - - - -
202406051700 run_post_mem001_f002 - - - - -
202406051700 run_post_mem001_f003 - - - - -
202406051700 run_post_mem002_f000 - - - - -
202406051700 run_post_mem002_f001 - - - - -
202406051700 run_post_mem002_f002 - - - - -
202406051700 run_post_mem002_f003 - - - - -
202406051700 plot_allvars_mem001_f000 - - - - -
202406051700 plot_allvars_mem001_f001 - - - - -
202406051700 plot_allvars_mem001_f002 - - - - -
202406051700 plot_allvars_mem001_f003 - - - - -
202406051700 plot_allvars_mem002_f000 - - - - -
202406051700 plot_allvars_mem002_f001 - - - - -
202406051700 plot_allvars_mem002_f002 - - - - -
202406051700 plot_allvars_mem002_f003 - - - - -
A test with changed that @EdwardSnyder-NOAA suggested finished successfully and fundamental tests (further below) have finished successfully:
(srw_app) [Natalie.Perlin@Hera:/scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/RRFS_test_CONUS_25km]$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION
================================================================================================================================
202406051700 make_grid 62168936 SUCCEEDED 0 1 63.0
202406051700 make_orog 62168982 SUCCEEDED 0 1 52.0
202406051700 make_sfc_climo 62169069 SUCCEEDED 0 1 53.0
202406051700 get_extrn_ics 62168937 SUCCEEDED 0 1 16.0
202406051700 get_extrn_lbcs 62168935 SUCCEEDED 0 1 17.0
202406051700 make_ics_mem001 62169108 SUCCEEDED 0 1 69.0
202406051700 make_lbcs_mem001 62169110 SUCCEEDED 0 1 115.0
202406051700 run_fcst_mem001 62169238 SUCCEEDED 0 1 618.0
202406051700 make_ics_mem002 62169111 SUCCEEDED 0 1 55.0
202406051700 make_lbcs_mem002 62169109 SUCCEEDED 0 1 121.0
202406051700 run_fcst_mem002 62169239 SUCCEEDED 0 1 614.0
202406051700 run_post_mem001_f000 62169650 SUCCEEDED 0 1 33.0
202406051700 run_post_mem001_f001 62169717 SUCCEEDED 0 1 41.0
202406051700 run_post_mem001_f002 62169718 SUCCEEDED 0 1 41.0
202406051700 run_post_mem001_f003 62169719 SUCCEEDED 0 1 39.0
202406051700 run_post_mem002_f000 62169653 SUCCEEDED 0 1 34.0
202406051700 run_post_mem002_f001 62169720 SUCCEEDED 0 1 41.0
202406051700 run_post_mem002_f002 62169721 SUCCEEDED 0 1 34.0
202406051700 run_post_mem002_f003 62169722 SUCCEEDED 0 1 41.0
202406051700 plot_allvars_mem001_f000 62169782 SUCCEEDED 0 1 126.0
202406051700 plot_allvars_mem001_f001 62169779 SUCCEEDED 0 1 126.0
202406051700 plot_allvars_mem001_f002 62169784 SUCCEEDED 0 1 126.0
202406051700 plot_allvars_mem001_f003 62169777 SUCCEEDED 0 1 143.0
202406051700 plot_allvars_mem002_f000 62169783 SUCCEEDED 0 1 127.0
202406051700 plot_allvars_mem002_f001 62169778 SUCCEEDED 0 1 131.0
202406051700 plot_allvars_mem002_f002 62169780 SUCCEEDED 0 1 126.0
202406051700 plot_allvars_mem002_f003 62169781 SUCCEEDED 0 1 131.0
(srw_app) [Natalie.Perlin@Hera:/scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/RRFS_test_CONUS_25km]$
Fundamental tests on Hera/intel:
Took 0:28:06.287742; will no longer monitor.
All 6 experiments finished
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 9.03
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 6.02
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 28.63
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024062 COMPLETE 34.01
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240620144 COMPLETE 23.23
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024062014484 COMPLETE 20.13
----------------------------------------------------------------------------------------------------
Total COMPLETE 121.05
Detailed summary written to /scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/WE2E_summary_20240620151657.txt
@gspetro-NOAA @MichaelLueken - All the expected changes and documentation updates are ready for this PR.
@gspetro-NOAA - Please feel free to comment on the RRFS-related documentation changes for the SRW.
A current develop
branch has been merged into the rrfs_ics_lbcs
branch, and successfully tested by running the grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta case.
@MichaelLueken - thank you for noticing documentation changes! I'll compare with the changes done and documentation built at a local system, to verify all the necessary changes are in place
@MichaelLueken - verified that all the documentation changes are in place. Please let me know how to proceed, and whether anything else is needed.
@natalie-perlin -
Thanks for reapplying the documentation modifications again! The documentation updates look good to me. It caught me off guard when I noticed that they had been changed to the way they were before you originally addressed my concerns with the documentation, but everything is once again good to go.
@natalie-perlin -
Thanks for reapplying the documentation modifications again! The documentation updates look good to me. It caught me off guard when I noticed that they had been changed to the way they were before you originally addressed my concerns with the documentation, but everything is once again good to go.
My apologies - some merges did not go well right away when I attempted to implement changes, address comments, and run a new test before yesterday's demo. Using different platforms to test the changes (AWS, Hera), build and change documentation (local system), and addressing comments + commits (GitHub) were likely not recorded properly. I'm glad that it's back to the expected.
@natalie-perlin -
I think I have finally figured out the issue with chgres_cube v2 surface files (fractional grid) not being read in by the weather model. RAP and HRRR use RUC LSM, which requires setting tiice
to 2 vertical levels. However, the number of ice levels is not being set to 2 for RAP and HRRR.
I'll need to do some work tomorrow to fully figure this out, but I should be able to update the UFS_UTILS hash to 1dac855, which will include the necessary changes to chgres_cube
's program_setup.F90
and add the necessary kice
entry to the model_configure
file so that the correct number of vertical levels will be used. With this, you should be able to remove the modification that was made to devbuild.sh, which is one of the major issues keeping @christinaholtNOAA from approving this work.
Thank you, Michael, fingers crossed!!
This PR has grown to accumulate too much in general. Please limit PRs to one feature.
The title should definitely be changed since it is not only adding support for RRFS ICs/LBCS, but also introducing graphics for ensembles, and addressing issues with the clean script.
All the features must pass PR before you can merge any of them and that's going to be an extremely tall order given the roadblocks of these particular features.
@christinaholtNOAA @MichaelLueken @mkavulich Yes, planning to take out additional features and improvements out of RRFS-focused PR, to be combined into a separate PR!
Moving this PR to On Hold status as we learn more about the RRFSv1 suspension and to work through reviewer comments.
DESCRIPTION OF CHANGES:
An option to use RRFS model output (control) files are added as initial and lateral boundary conditions, ICS and LBCS. RRFS_a data for the test was retrieved from the NODD website ((https://registry.opendata.aws/noaa-rrfs/)), pressure-level grib2 files from the control directory, RRFS forecasts interpolated into 3-km regular grid.
A new test has been added
grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta
with RRFS input files for the event on 06/05/2024 with the tornadoes reported in MarylandWorkflow for the plotting tasks has been updated to allow graphic output for individual ensemble members.
Python plotting script is updated to have geographical data visible over the plotted fields.
Updated devclean.sh script with safety checks, addressing #1073 (https://github.com/ufs-community/ufs-srweather-app/issues/1073)
UPDATE (6/13/2024):
RRFS data location:
https://noaa-rrfs-pds.s3.amazonaws.com/rrfs_a/rrfs_a.{yyyymmdd}/{hh}/control/
files are in the formatrrfs.t{hh}z.prslev.f{fcst_hr:03d}.conus.grib2
where {yyyymmdd} are 4-digit year, 2-digit month, and 2-digit day of the forecast cycle, and {hh} is a 2-digit hour of the forecast cycle (forecast start), and {fcst_hr:03d} is a 3-digit forecast hour.Browsing the bucket could be done at the site: browse the bucket: https://noaa-rrfs-pds.s3.amazonaws.com/index.html#rrfs_a/
For this PR, RRFS input data uses are interpolated into a regular 3-km grid, these files need older sfs_data v1. The sfc_data v2 that contains rotated u,v fields or fractional grids will be needed to use a newer UFS_UTILS version and tag. This would allow use of full RRFS input files, i.e. on a native grid with no remapping into regular grids; these files are ~6GB per file, and also require higher-version of packages (g2) that are not present in a spack-stack v1.5.1 or 1.6.0.
The following needs to added to config.yaml file to use RRFS ICS/LBCS option: (an example)
An example of a config.yaml file is attached. It accessed the data from a pre-stage standard location. Variables such as EXTRN_MDL_SOURCE_BASEDIR_ICS, EXTRN_MDL_FILES_ICS, EXTRN_MDL_SOURCE_BASEDIR_LBCS, EXTRN_MDL_FILES_LBCS need to be added for another date/forecast cycle. If data are not found on disk, it is retrieved from the AWS.
config.yaml.txt
Type of change
TESTS CONDUCTED:
Conducted a test for RRFS_CONUScompact_25km grid, setting ICS and LBCS to "RRFS" option, running on NOAA AWS cloud. One-, two-, and 3-ensemble member experiments. A new test configured, config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta , which could be launched on all the platforms, with data staged in standard location for EPIC project. Fundamental tests (not including a newly developed one) pass successfully on AWS.
staged data for the test using RRFS ICS/LBCS for the SRW:
NOAA Cloud: /contrib/EPIC/UFS_SRW_data/develop/input_model_data/RRFS/ Derecho: /glade/work/epicufsrt/contrib/UFS_SRW_data/develop/input_model_data/RRFS/ Hera: /scratch1/NCEPDEV/nems/role.epic/UFS_SRW_data/develop/input_model_data Gaea: /gpfs/f5/epic/world-shared/UFS_SRW_data/develop/input_model_data/RRFS/ Jet: /mnt/lfs4/HFIP/hfv3gfs/role.epic/UFS_SRW_data/develop/input_model_data/RRFS/ Orion/Hercules: /work/noaa/epic/role-epic/contrib/UFS_SRW_data/develop/input_model_data/RRFS/
A directory that uses forecast cycle date stamp for the test, ./2024060517, has 10 files: rrfs.t17z.prslev.f000.conus.grib2 rrfs.t17z.prslev.f001.conus.grib2 rrfs.t17z.prslev.f002.conus.grib2 rrfs.t17z.prslev.f003.conus.grib2 rrfs.t17z.prslev.f004.conus.grib2 rrfs.t17z.prslev.f005.conus.grib2 rrfs.t17z.prslev.f006.conus.grib2 rrfs.t17z.prslev.f007.conus.grib2 rrfs.t17z.prslev.f008.conus.grib2 rrfs.t17z.prslev.f009.conus.grib2
DEPENDENCIES:
DOCUMENTATION:
A new option for "RRFS" used as ICS and LBCS may need to be documented.
ISSUE:
In preparation for RRFS integration tasks, option to use "RRFS" model file as ICS and LBCS was added.
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@christinaholtNOAA