ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
53 stars 114 forks source link

[develop] Added an option for RRFS external model files used as ICS and LBCS #1089

Open natalie-perlin opened 1 month ago

natalie-perlin commented 1 month ago

DESCRIPTION OF CHANGES:

UPDATE (6/13/2024):

RRFS data location: https://noaa-rrfs-pds.s3.amazonaws.com/rrfs_a/rrfs_a.{yyyymmdd}/{hh}/control/ files are in the format rrfs.t{hh}z.prslev.f{fcst_hr:03d}.conus.grib2 where {yyyymmdd} are 4-digit year, 2-digit month, and 2-digit day of the forecast cycle, and {hh} is a 2-digit hour of the forecast cycle (forecast start), and {fcst_hr:03d} is a 3-digit forecast hour.

Browsing the bucket could be done at the site: browse the bucket: https://noaa-rrfs-pds.s3.amazonaws.com/index.html#rrfs_a/

For this PR, RRFS input data uses are interpolated into a regular 3-km grid, these files need older sfs_data v1. The sfc_data v2 that contains rotated u,v fields or fractional grids will be needed to use a newer UFS_UTILS version and tag. This would allow use of full RRFS input files, i.e. on a native grid with no remapping into regular grids; these files are ~6GB per file, and also require higher-version of packages (g2) that are not present in a spack-stack v1.5.1 or 1.6.0.

The following needs to added to config.yaml file to use RRFS ICS/LBCS option: (an example)

task_get_extrn_ics:
  EXTRN_MDL_NAME_ICS: RRFS
  USE_USER_STAGED_EXTRN_FILES: true
  FV3GFS_FILE_FMT_ICS: grib2
  EXTRN_MDL_SOURCE_BASEDIR_ICS: /lustre/SRW_DATA/RRFS/rrfs_a.20230501/${hh}/control
  EXTRN_MDL_FILES_ICS:
    - 'rrfs.t{hh}z.prslev.f{fcst_hr:03d}.conus.grib2'
task_get_extrn_lbcs:
  EXTRN_MDL_NAME_LBCS: RRFS
  USE_USER_STAGED_EXTRN_FILES: true
  LBC_SPEC_INTVL_HRS: 1
  FV3GFS_FILE_FMT_LBCS: grib2
  EXTRN_MDL_SOURCE_BASEDIR_LBCS: /lustre/SRW_DATA/RRFS/rrfs_a.20230501/${hh}/control
  EXTRN_MDL_FILES_LBCS:
    - 'rrfs.t{hh}z.prslev.f{fcst_hr:03d}.conus.grib2'

An example of a config.yaml file is attached. It accessed the data from a pre-stage standard location. Variables such as EXTRN_MDL_SOURCE_BASEDIR_ICS, EXTRN_MDL_FILES_ICS, EXTRN_MDL_SOURCE_BASEDIR_LBCS, EXTRN_MDL_FILES_LBCS need to be added for another date/forecast cycle. If data are not found on disk, it is retrieved from the AWS.

config.yaml.txt

Type of change

TESTS CONDUCTED:

Conducted a test for RRFS_CONUScompact_25km grid, setting ICS and LBCS to "RRFS" option, running on NOAA AWS cloud. One-, two-, and 3-ensemble member experiments. A new test configured, config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta , which could be launched on all the platforms, with data staged in standard location for EPIC project. Fundamental tests (not including a newly developed one) pass successfully on AWS.

staged data for the test using RRFS ICS/LBCS for the SRW:

NOAA Cloud: /contrib/EPIC/UFS_SRW_data/develop/input_model_data/RRFS/ Derecho: /glade/work/epicufsrt/contrib/UFS_SRW_data/develop/input_model_data/RRFS/ Hera: /scratch1/NCEPDEV/nems/role.epic/UFS_SRW_data/develop/input_model_data Gaea: /gpfs/f5/epic/world-shared/UFS_SRW_data/develop/input_model_data/RRFS/ Jet: /mnt/lfs4/HFIP/hfv3gfs/role.epic/UFS_SRW_data/develop/input_model_data/RRFS/ Orion/Hercules: /work/noaa/epic/role-epic/contrib/UFS_SRW_data/develop/input_model_data/RRFS/

A directory that uses forecast cycle date stamp for the test, ./2024060517, has 10 files: rrfs.t17z.prslev.f000.conus.grib2 rrfs.t17z.prslev.f001.conus.grib2 rrfs.t17z.prslev.f002.conus.grib2 rrfs.t17z.prslev.f003.conus.grib2 rrfs.t17z.prslev.f004.conus.grib2 rrfs.t17z.prslev.f005.conus.grib2 rrfs.t17z.prslev.f006.conus.grib2 rrfs.t17z.prslev.f007.conus.grib2 rrfs.t17z.prslev.f008.conus.grib2 rrfs.t17z.prslev.f009.conus.grib2

DEPENDENCIES:

DOCUMENTATION:

A new option for "RRFS" used as ICS and LBCS may need to be documented.

ISSUE:

In preparation for RRFS integration tasks, option to use "RRFS" model file as ICS and LBCS was added.

CHECKLIST

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

CONTRIBUTORS (optional):

@christinaholtNOAA

MichaelLueken commented 1 month ago

@natalie-perlin -

I'll move this work to On-Hold until a WE2E test has been added so that we can properly test this new functionality.

natalie-perlin commented 1 month ago

NB: @MichaelLueken - this PR requires an option "RRFS" to be allowed in UFS_UTILS. Current develop branch of ufs-community/UFS_UTILS does have the "RRFS" option enabled, but the version checked out by the SRW does not (a correction needs to be to allow it).

How should we proceed with this requirement?

natalie-perlin commented 1 month ago

An updated tag could be used for the UFS_UTILS that had this option implemented

MichaelLueken commented 1 month ago

@natalie-perlin -

I'll check if updating the version of UFS_UTILS will work in the SRW App. The commit in UFS_UTILS following what is currently in the SRW App's External.cfg file causes the weather model to fail (the weather model is expecting sheleg, while chgres_cube is generating sheleg_ice and sheleg_land, leading to the previously mentioned failure).

I'll go ahead and try updating the UFS_UTILS version to the latest version and see if it works. If it does, then we can move forward with this update. However, if it continues to fail, I will need to open an issue in the UFS_UTILS repository to let them know about the continued failures and see what can be done.

MichaelLueken commented 1 month ago

@natalie-perlin -

What version of UFS_UTILS contains the necessary fix so that we can exercise the use of RRFS ICs/LBCs in the SRW App? I can try to update to that version and see what issues appear.

MichaelLueken commented 1 month ago

It looks like UFS_UTILS PR #902 includes the necessary changes for chgres_cube to work with RRFS. I'll try a later version of the UFS_UTILS repository, then this one, to see if either will work.

natalie-perlin commented 1 month ago

It looks like UFS_UTILS PR #902 includes the necessary changes for chgres_cube to work with RRFS. I'll try a later version of the UFS_UTILS repository, then this one, to see if either will work.

Yes - thank you!! I was having troubles finding exact time/version when this change was implemented!. The changes required to allow for RRFS option had to be done in two locations in UFS_UTILS repository in ./sorc/chgres_cube.fd/program_setup.F90: line 57 and lines 321-322. It looks like the PR you mentioned address that: https://github.com/ufs-community/UFS_UTILS/pull/902/files#diff-6b6d24e7712144952ef83ca8f5e9d56e164fdcab1f7faab27812e91bfd483ba2

MichaelLueken commented 1 month ago

@natalie-perlin -

Using the version of UFS_UTILS associated with PR #902 is causing a failure in the fundamental tests:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE               9.31
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              15.17
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024060  DEAD                   5.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240603165  COMPLETE              23.23
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024060316510  COMPLETE              20.14
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                  81.78

The grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR WE2E test failed in run_fcst_mem000 with the following error message:

FATAL from PE 0: NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice

I'll try backing my way through the commits in the UFS_UTILS repository to see which entry is causing issues with tiice.

natalie-perlin commented 1 month ago

@MichaelLueken - thank you for testing!! Let me look into these errors - look like a data problem. I might need to stage an additional directory in the EPIC space with data that I though was not needed... will get back to you!

natalie-perlin commented 1 month ago

@MichaelLueken - What is the location of your test? I was not able to reproduce this error.

However, I'm also replacing the explicit format statement for RRFS in the config.yaml file by the format given in the code, and doing some more changes for this PR, but not yet in GitHub.

MichaelLueken commented 1 month ago

@natalie-perlin -

I had been working on Hera, so I had to prepare this work on another machine. On Gaea, using the 1dac855 hash from UFS_UTILS (PR #902), the fundamental WE2E test suite is failing with the same issue as seen on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              19.99
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              12.94
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              27.55
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024060  DEAD                   7.53
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240604105  COMPLETE              33.85
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024060410591  COMPLETE              46.89
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                 148.75

The grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR WE2E test is failing in run_fcst_mem000 with the following error message:

FATAL from PE 3: NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice

The test can be fond on Gaea - /gpfs/f5/epic/scratch/Michael.Lueken/ufs-srweather-app/expt_dirs/grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR

I was also able to find that the changes associated with PR #873 are what is causing these issues.

natalie-perlin commented 1 month ago

@natalie-perlin -

I had been working on Hera, so I had to prepare this work on another machine. On Gaea, using the 1dac855 hash from UFS_UTILS (PR #902), the fundamental WE2E test suite is failing with the same issue as seen on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              19.99
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              12.94
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              27.55
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024060  DEAD                   7.53
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240604105  COMPLETE              33.85
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024060410591  COMPLETE              46.89
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                 148.75

The grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR WE2E test is failing in run_fcst_mem000 with the following error message:

FATAL from PE 3: NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice

The test can be fond on Gaea - /gpfs/f5/epic/scratch/Michael.Lueken/ufs-srweather-app/expt_dirs/grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR

I was also able to find that the changes associated with PR #873 are what is causing these issues.

Thank you, Michael, for testing The RRFS data is yet to be staged on Gaea

MichaelLueken commented 1 month ago

@natalie-perlin -

Have you tried running the fundamental WE2E test suite with the modifications you have made to use RRFS external model files to make ICs and LBCs? If you try running the fundamental test suite using the updated UFS_UTILS hash, you should encounter the failure that I have noted above.

I haven't tried running your rrfs_ics_lbcs branch or an experiment using RRFS external model files with an updated UFS_UTILS hash. All of my testing was off of my old feature/hash_update branch. The failure encountered has been in the fundamental test suite.

natalie-perlin commented 1 month ago

It would be good to add the RRFS file entry to data_locations.yaml.

Yes, I'm doing this as well as a part of this PR - still work in progress, and these changes are not yet in my GitHub repository. There are some other issues that appear after I attempted to introduce many changes at once. So I'm stepping back to the point there it was fully working (including my own changes to the UFS_UTILS), and adding one-by-one.

natalie-perlin commented 1 month ago

@MichaelLueken - some changes pushed to the branch. However, the problem with the forecast phase still remains. It looks like surface data sfc_data.nc file needs to be of different format when the updated chgres_cube is used. I placed a comment to the UFS_UTILS asking for some suggestions: https://github.com/ufs-community/UFS_UTILS/issues/850#issuecomment-2154293614

natalie-perlin commented 1 month ago

A case study using RRFS ICS/LBCS has been successfully tested, and fundamental tests (not containing the new test) pass successfully on AWS as well. Logs attached.

log.run_WE2E_tests.txt WE2E_tests_20240613160048.yaml.txt

A test added config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml

which includes plotting tasks as well. It has three ensemble members, and I'm not sure if the plotting tasks could be done to individual members, or how to handle task dependency in case of ensemble. So plotting tasks are not launched, as workflow dependency needs some adjustment, unless we want to remove a plotting task.

rootostat prompt looks like following and does not advance further:

 (srw_app) [Natalie.Perlin@NOAA-AWS:/lustre/SRW/expt_dirs/grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta]$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
/apps/rocoto/1.3.3/lib/workflowmgr/launchserver.rb:40: warning: Insecure world writable dir /lustre in PATH, mode 040777
/apps/rocoto/1.3.3/lib/workflowmgr/launchserver.rb:40: warning: Insecure world writable dir /lustre in PATH, mode 040777
       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202406051700               make_grid                         403           SUCCEEDED                   0         1          14.0
202406051700               make_orog                         406           SUCCEEDED                   0         1          38.0
202406051700          make_sfc_climo                         407           SUCCEEDED                   0         1          85.0
202406051700           get_extrn_ics                         404           SUCCEEDED                   0         1           8.0
202406051700          get_extrn_lbcs                         405           SUCCEEDED                   0         1           8.0
202406051700         make_ics_mem001                         408           SUCCEEDED                   0         1         434.0
202406051700        make_lbcs_mem001                         409           SUCCEEDED                   0         1         471.0
202406051700         run_fcst_mem001                         414           SUCCEEDED                   0         1        1046.0
202406051700         make_ics_mem002                         410           SUCCEEDED                   0         1         434.0
202406051700        make_lbcs_mem002                         411           SUCCEEDED                   0         1         568.0
202406051700         run_fcst_mem002                         415           SUCCEEDED                   0         1        1050.0
202406051700         make_ics_mem003                         412           SUCCEEDED                   0         1         427.0
202406051700        make_lbcs_mem003                         413           SUCCEEDED                   0         1         568.0
202406051700         run_fcst_mem003                         416           SUCCEEDED                   0         1         481.0
202406051700    run_post_mem001_f000                         417           SUCCEEDED                   0         1          17.0
202406051700    run_post_mem001_f001                         418           SUCCEEDED                   0         1          17.0
202406051700    run_post_mem001_f002                         424           SUCCEEDED                   0         1          10.0
202406051700    run_post_mem001_f003                         425           SUCCEEDED                   0         1           9.0
202406051700    run_post_mem002_f000                         420           SUCCEEDED                   0         1          12.0
202406051700    run_post_mem002_f001                         426           SUCCEEDED                   0         1           9.0
202406051700    run_post_mem002_f002                         427           SUCCEEDED                   0         1          10.0
202406051700    run_post_mem002_f003                         428           SUCCEEDED                   0         1          12.0
202406051700    run_post_mem003_f000                         419           SUCCEEDED                   0         1          16.0
202406051700    run_post_mem003_f001                         421           SUCCEEDED                   0         1          13.0
202406051700    run_post_mem003_f002                         422           SUCCEEDED                   0         1          13.0
202406051700    run_post_mem003_f003                         423           SUCCEEDED                   0         1          17.0
202406051700            plot_allvars                           -                   -                   -         -             -
MichaelLueken commented 1 month ago

@natalie-perlin -

I was able to get the new grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta WE2E test to run by setting:

platform:
  EXTRN_MDL_DATA_STORES: aws

in the configuration file and removing:

USE_USER_STAGED_EXTRN_FILES: true

for both task_get_extrn_ics and task_get_extrn_lbcs.

I also see the same behavior as you with respect to the plot_allvars task:

       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202406051700               make_grid                    61723209           SUCCEEDED                   0         1          18.0
202406051700               make_orog                    61723628           SUCCEEDED                   0         1          27.0
202406051700          make_sfc_climo                    61723639           SUCCEEDED                   0         1          42.0
202406051700           get_extrn_ics                    61723210           SUCCEEDED                   0         1         768.0
202406051700          get_extrn_lbcs                    61723211           SUCCEEDED                   0         1        1669.0
202406051700         make_ics_mem001                    61723748           SUCCEEDED                   0         1          48.0
202406051700        make_lbcs_mem001                    61725500           SUCCEEDED                   0         1         113.0
202406051700         run_fcst_mem001                    61725735           SUCCEEDED                   0         1         554.0
202406051700         make_ics_mem002                    61723746           SUCCEEDED                   0         1          50.0
202406051700        make_lbcs_mem002                    61725502           SUCCEEDED                   0         1         101.0
202406051700         run_fcst_mem002                    61725738           SUCCEEDED                   0         1         549.0
202406051700         make_ics_mem003                    61723747           SUCCEEDED                   0         1          47.0
202406051700        make_lbcs_mem003                    61725501           SUCCEEDED                   0         1         108.0
202406051700         run_fcst_mem003                    61725736           SUCCEEDED                   0         1         555.0
202406051700    run_post_mem001_f000                    61726433           SUCCEEDED                   0         1          16.0
202406051700    run_post_mem001_f001                    61726595           SUCCEEDED                   0         1          15.0
202406051700    run_post_mem001_f002                    61726826           SUCCEEDED                   0         1          17.0
202406051700    run_post_mem001_f003                    61726824           SUCCEEDED                   0         1          16.0
202406051700    run_post_mem002_f000                    61726525           SUCCEEDED                   0         1          14.0
202406051700    run_post_mem002_f001                    61726607           SUCCEEDED                   0         1          21.0
202406051700    run_post_mem002_f002                    61726829           SUCCEEDED                   0         1          14.0
202406051700    run_post_mem002_f003                    61726830           SUCCEEDED                   0         1          16.0
202406051700    run_post_mem003_f000                    61726434           SUCCEEDED                   0         1          16.0
202406051700    run_post_mem003_f001                    61726594           SUCCEEDED                   0         1          13.0
202406051700    run_post_mem003_f002                    61726823           SUCCEEDED                   0         1          17.0
202406051700    run_post_mem003_f003                    61726825           SUCCEEDED                   0         1          16.0
202406051700            plot_allvars                           -                   -                   -         -             -
natalie-perlin commented 1 month ago

@natalie-perlin -

I was able to get the new grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta WE2E test to run by setting:

platform:
  EXTRN_MDL_DATA_STORES: aws

in the configuration file and removing:

USE_USER_STAGED_EXTRN_FILES: true

for both task_get_extrn_ics and task_get_extrn_lbcs.

I also see the same behavior as you with respect to the plot_allvars task: ...

@MichaelLueken - yes, that works on the systems that have network access, but would not work for Hera, for example.

MichaelLueken commented 1 month ago

@natalie-perlin -

Looking in parm/wflow/plot.yaml, I think I see why the experiment is not kicking off the plot_allvars task. In order for the task to start, the run_post_mem000_f000 task will have needed to have completed:

  dependency:
    or_do_post: &post_files_exist
      and_run_post: # If post was meant to run, wait on the whole post metatask
        taskvalid:
          attrs:
            task: run_post_mem000_f000
        metataskdep:
          attrs:
            metatask: run_ens_post

I was able to make plot_allvars run by changing run_post_mem000_f000 to run_post_mem001_f000. However, the job fails because it was looking for /scratch2/NAGAPE/epic/Michael.Lueken/expt_dirs/grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta/2024060517/mem#mem#/postprd/srw.t17z.prslev.f000.rrfs_conuscompact_25km.grib2, rather than /scratch2/NAGAPE/epic/Michael.Lueken/expt_dirs/grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta/2024060517/mem001/postprd/srw.t17z.prslev.f000.rrfs_conuscompact_25km.grib2.

The current implementation of plot_allvars only works for deterministic runs (no ensembles). The parm/wflow/plot.yaml file will need to be updated to allow for the capability to plot ensemble forecasts.

natalie-perlin commented 4 weeks ago

@MichaelLueken @christinaholtNOAA - The plotting configuration file has been updated in ./parm/wflow/plot.yaml, but the plotting tasks still do not show up in the rocotostat output. Any suggestions to what is needed to have it visible?

This is what is set up in config.yaml for rocoto tasks:

rocoto:
  tasks:
    taskgroups: '{{ ["parm/wflow/prep.yaml", "parm/wflow/coldstart.yaml", "parm/wflow/post.yaml", "parm/wflow/plot.yaml"]|include }}'

But the rocotostat shows the following:

  (srw_app) [Natalie.Perlin@Hera:/scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/RRFS_test_CONUS_25km]$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202406051700               make_grid                    62004343           SUCCEEDED                   0         1          73.0
202406051700               make_orog                    62004848           SUCCEEDED                   0         1          51.0
202406051700          make_sfc_climo                    62005705           SUCCEEDED                   0         1          57.0
202406051700           get_extrn_ics                    62004344           SUCCEEDED                   0         1          15.0
202406051700          get_extrn_lbcs                    62004345           SUCCEEDED                   0         1          23.0
202406051700         make_ics_mem001                    62005915           SUCCEEDED                   0         1          61.0
202406051700        make_lbcs_mem001                    62005913           SUCCEEDED                   0         1         125.0
202406051700         run_fcst_mem001                    62006240           SUCCEEDED                   0         1         626.0
202406051700         make_ics_mem002                    62005914           SUCCEEDED                   0         1          61.0
202406051700        make_lbcs_mem002                    62005912           SUCCEEDED                   0         1         123.0
202406051700         run_fcst_mem002                    62006239           SUCCEEDED                   0         1         630.0
202406051700    run_post_mem001_f000                    62008026           SUCCEEDED                   0         1          32.0
202406051700    run_post_mem001_f001                    62008025           SUCCEEDED                   0         1          34.0
202406051700    run_post_mem001_f002                    62008027           SUCCEEDED                   0         1          39.0
202406051700    run_post_mem001_f003                    62008028           SUCCEEDED                   0         1          38.0
202406051700    run_post_mem002_f000                    62008029           SUCCEEDED                   0         1          41.0
202406051700    run_post_mem002_f001                    62008030           SUCCEEDED                   0         1          41.0
202406051700    run_post_mem002_f002                    62008031           SUCCEEDED                   0         1          37.0
202406051700    run_post_mem002_f003                    62008032           SUCCEEDED                   0         1          39.0

no plotting tasks show up.

EdwardSnyder-NOAA commented 3 weeks ago

@natalie-perlin - It appears there is a typo in your parm/wflow/plot.yaml file. Line 32 should be metatask_plot_allvars_mem#mem#_all_fhrs:. Fixing this will add the plot_allvars task to your rrfs experiment. See code snippet below:

       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202406051700               make_grid    druby://10.184.10.61:36013          SUBMITTING                   -         0           0.0
202406051700               make_orog                           -                   -                   -         -             -
202406051700          make_sfc_climo                           -                   -                   -         -             -
202406051700           get_extrn_ics    druby://10.184.10.61:36013          SUBMITTING                   -         0           0.0
202406051700          get_extrn_lbcs    druby://10.184.10.61:36013          SUBMITTING                   -         0           0.0
202406051700         make_ics_mem001                           -                   -                   -         -             -
202406051700        make_lbcs_mem001                           -                   -                   -         -             -
202406051700         run_fcst_mem001                           -                   -                   -         -             -
202406051700         make_ics_mem002                           -                   -                   -         -             -
202406051700        make_lbcs_mem002                           -                   -                   -         -             -
202406051700         run_fcst_mem002                           -                   -                   -         -             -
202406051700    run_post_mem001_f000                           -                   -                   -         -             -
202406051700    run_post_mem001_f001                           -                   -                   -         -             -
202406051700    run_post_mem001_f002                           -                   -                   -         -             -
202406051700    run_post_mem001_f003                           -                   -                   -         -             -
202406051700    run_post_mem002_f000                           -                   -                   -         -             -
202406051700    run_post_mem002_f001                           -                   -                   -         -             -
202406051700    run_post_mem002_f002                           -                   -                   -         -             -
202406051700    run_post_mem002_f003                           -                   -                   -         -             -
202406051700    plot_allvars_mem001_f000                           -                   -                   -         -             -
202406051700    plot_allvars_mem001_f001                           -                   -                   -         -             -
202406051700    plot_allvars_mem001_f002                           -                   -                   -         -             -
202406051700    plot_allvars_mem001_f003                           -                   -                   -         -             -
202406051700    plot_allvars_mem002_f000                           -                   -                   -         -             -
202406051700    plot_allvars_mem002_f001                           -                   -                   -         -             -
202406051700    plot_allvars_mem002_f002                           -                   -                   -         -             -
202406051700    plot_allvars_mem002_f003                           -                   -                   -         -             -
natalie-perlin commented 3 weeks ago

A test with changed that @EdwardSnyder-NOAA suggested finished successfully and fundamental tests (further below) have finished successfully:

(srw_app) [Natalie.Perlin@Hera:/scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/RRFS_test_CONUS_25km]$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202406051700               make_grid                    62168936           SUCCEEDED                   0         1          63.0
202406051700               make_orog                    62168982           SUCCEEDED                   0         1          52.0
202406051700          make_sfc_climo                    62169069           SUCCEEDED                   0         1          53.0
202406051700           get_extrn_ics                    62168937           SUCCEEDED                   0         1          16.0
202406051700          get_extrn_lbcs                    62168935           SUCCEEDED                   0         1          17.0
202406051700         make_ics_mem001                    62169108           SUCCEEDED                   0         1          69.0
202406051700        make_lbcs_mem001                    62169110           SUCCEEDED                   0         1         115.0
202406051700         run_fcst_mem001                    62169238           SUCCEEDED                   0         1         618.0
202406051700         make_ics_mem002                    62169111           SUCCEEDED                   0         1          55.0
202406051700        make_lbcs_mem002                    62169109           SUCCEEDED                   0         1         121.0
202406051700         run_fcst_mem002                    62169239           SUCCEEDED                   0         1         614.0
202406051700    run_post_mem001_f000                    62169650           SUCCEEDED                   0         1          33.0
202406051700    run_post_mem001_f001                    62169717           SUCCEEDED                   0         1          41.0
202406051700    run_post_mem001_f002                    62169718           SUCCEEDED                   0         1          41.0
202406051700    run_post_mem001_f003                    62169719           SUCCEEDED                   0         1          39.0
202406051700    run_post_mem002_f000                    62169653           SUCCEEDED                   0         1          34.0
202406051700    run_post_mem002_f001                    62169720           SUCCEEDED                   0         1          41.0
202406051700    run_post_mem002_f002                    62169721           SUCCEEDED                   0         1          34.0
202406051700    run_post_mem002_f003                    62169722           SUCCEEDED                   0         1          41.0
202406051700    plot_allvars_mem001_f000                    62169782           SUCCEEDED                   0         1         126.0
202406051700    plot_allvars_mem001_f001                    62169779           SUCCEEDED                   0         1         126.0
202406051700    plot_allvars_mem001_f002                    62169784           SUCCEEDED                   0         1         126.0
202406051700    plot_allvars_mem001_f003                    62169777           SUCCEEDED                   0         1         143.0
202406051700    plot_allvars_mem002_f000                    62169783           SUCCEEDED                   0         1         127.0
202406051700    plot_allvars_mem002_f001                    62169778           SUCCEEDED                   0         1         131.0
202406051700    plot_allvars_mem002_f002                    62169780           SUCCEEDED                   0         1         126.0
202406051700    plot_allvars_mem002_f003                    62169781           SUCCEEDED                   0         1         131.0
(srw_app) [Natalie.Perlin@Hera:/scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/RRFS_test_CONUS_25km]$

Fundamental tests on Hera/intel:

Took 0:28:06.287742; will no longer monitor.
All 6 experiments finished
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE               9.03
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               6.02
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              28.63
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024062  COMPLETE              34.01
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240620144  COMPLETE              23.23
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024062014484  COMPLETE              20.13
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             121.05

Detailed summary written to /scratch1/NCEPDEV/nems/Natalie.Perlin/expt_dirs/WE2E_summary_20240620151657.txt
natalie-perlin commented 3 weeks ago

@gspetro-NOAA @MichaelLueken - All the expected changes and documentation updates are ready for this PR.

natalie-perlin commented 3 weeks ago

@gspetro-NOAA - Please feel free to comment on the RRFS-related documentation changes for the SRW.

natalie-perlin commented 3 weeks ago

A current develop branch has been merged into the rrfs_ics_lbcs branch, and successfully tested by running the grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta case.

natalie-perlin commented 3 weeks ago

@MichaelLueken - thank you for noticing documentation changes! I'll compare with the changes done and documentation built at a local system, to verify all the necessary changes are in place

natalie-perlin commented 3 weeks ago

@MichaelLueken - verified that all the documentation changes are in place. Please let me know how to proceed, and whether anything else is needed.

MichaelLueken commented 3 weeks ago

@natalie-perlin -

Thanks for reapplying the documentation modifications again! The documentation updates look good to me. It caught me off guard when I noticed that they had been changed to the way they were before you originally addressed my concerns with the documentation, but everything is once again good to go.

natalie-perlin commented 3 weeks ago

@natalie-perlin -

Thanks for reapplying the documentation modifications again! The documentation updates look good to me. It caught me off guard when I noticed that they had been changed to the way they were before you originally addressed my concerns with the documentation, but everything is once again good to go.

My apologies - some merges did not go well right away when I attempted to implement changes, address comments, and run a new test before yesterday's demo. Using different platforms to test the changes (AWS, Hera), build and change documentation (local system), and addressing comments + commits (GitHub) were likely not recorded properly. I'm glad that it's back to the expected.

MichaelLueken commented 3 weeks ago

@natalie-perlin -

I think I have finally figured out the issue with chgres_cube v2 surface files (fractional grid) not being read in by the weather model. RAP and HRRR use RUC LSM, which requires setting tiice to 2 vertical levels. However, the number of ice levels is not being set to 2 for RAP and HRRR.

I'll need to do some work tomorrow to fully figure this out, but I should be able to update the UFS_UTILS hash to 1dac855, which will include the necessary changes to chgres_cube's program_setup.F90 and add the necessary kice entry to the model_configure file so that the correct number of vertical levels will be used. With this, you should be able to remove the modification that was made to devbuild.sh, which is one of the major issues keeping @christinaholtNOAA from approving this work.

Thank you, Michael, fingers crossed!!

natalie-perlin commented 2 weeks ago

This PR has grown to accumulate too much in general. Please limit PRs to one feature.

The title should definitely be changed since it is not only adding support for RRFS ICs/LBCS, but also introducing graphics for ensembles, and addressing issues with the clean script.

All the features must pass PR before you can merge any of them and that's going to be an extremely tall order given the roadblocks of these particular features.

@christinaholtNOAA @MichaelLueken @mkavulich Yes, planning to take out additional features and improvements out of RRFS-focused PR, to be combined into a separate PR!

MichaelLueken commented 5 days ago

Moving this PR to On Hold status as we learn more about the RRFSv1 suspension and to work through reviewer comments.