ufs-community / UFS_UTILS

Utilities for the NCEP models.
Other
20 stars 103 forks source link

Start+count exceeds dimension bound error message from ufs-weather-model while using fractional grid update in SRW App #961

Open MichaelLueken opened 3 weeks ago

MichaelLueken commented 3 weeks ago

While attempting to run the current UFS_UTILS develop HEAD (though, this same issue has been seen following updating the UFS_UTILS hash to 7addff5) in the SRW App, one of the fundamental Workflow-End-to-End (WE2E) tests, grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR, is failing while attempting to run the forecast with the following error message:

NetCDF: Start+count exceeds dimension bound: netcdf_read_data_3d: file:INPUT/sfc_data.nc- variable:tiice

The version of the ufs-weather-model that is currently being used with this testing is 1c6b4d4 from May 16, 2024.

More information regarding the WE2E test that failed:

CCPP physics suite used is FV3_HRRR. The predefined grid is RRFS_CONUScompact_25km. Both the ICs and LBCs were derived from the HRRR. The test is a 24-hour forecast beginning on 2020081000.

Has anyone encountered this behavior before? What additional changes should be made to the workflow to correct this error?

MichaelLueken commented 3 weeks ago

Some additional details:

The comprehensive test suite was run and there were 21 failures total, all with the same error noted above. The listing of failed tests include:

custom_ESGgrid - FV3_HRRR
custom_ESGgrid_Great_Lakes_snow_8km - FV3_RAP
custom_ESGgrid_NewZealand_3km - FV3_HRRR
custom_ESGgrid_Peru_12km - FV3_RAP
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2mems - FV3_HRRR
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2mems - FV3_HRRR
get_from_HPSS_ics_HRRR_lbcs_RAP - FV3_HRRR
get_from_HPSS_ics_RAP_lbcs_RAP - FV3_HRRR
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR - FV3_HRRR
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP - FV3_RAP
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR - FV3_HRRR
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR - FV3_HRRR
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP - FV3_RAP
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP - FV3_RAP
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR - FV3_HRRR
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR - FV3_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR - FV3_HRRR
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR - FV3_HRRR
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP - FV3_RAP
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR - FV3_HRRR
long_fcst - FV3_RAP

In the above list, the FV3_RAP and FV3_HRRR indicate the CCPP physics suite used. Of note, all tests using FV3_RAP and FV3_HRRR failed, while all other physics suites successfully passed.

Given the failure in INPUT/sfc_data.nc, this file is generated in the exregional_make_ics script. Looking for these two specific physics suites in the exregional_make_ics script, it is only used for setting either GSDphys_var_map.txt or GFSphys_var_map.txt as the varmap table to use.

MichaelLueken commented 1 week ago

Thank you very much, @GeorgeGayno-NOAA, for the email correspondence and checking the consistency of the files in ./orog and ./sfc_climo that points with some land have valid surface data.

It turns out that the issue is due to the fact that both RAP and HRRR SDFs use RUC LSM. Unfortunately, Model%kice is 9 for RUC LSM, but tiice in the initial conditions only has two vertical layers. This is causing the issue that is being encountered in this issue. It's not clear to me how to best address this issue, since tiice has two vertical layers, but Model%kice is required to be 9 for RUC LSM. Would it be possible to add in the v1 sfc file generation to chgres_cube, so that v1 sfc data files can be used for RAP and HRRR physics suites, while v2 sfc data can be used for the rest of the non-RUC LSM based physics suites?

Thank you very much for the assistance with this issue!

GeorgeGayno-NOAA commented 1 week ago

Thank you very much, @GeorgeGayno-NOAA, for the email correspondence and checking the consistency of the files in ./orog and ./sfc_climo that points with some land have valid surface data.

It turns out that the issue is due to the fact that both RAP and HRRR SDFs use RUC LSM. Unfortunately, Model%kice is 9 for RUC LSM, but tiice in the initial conditions only has two vertical layers. This is causing the issue that is being encountered in this issue. It's not clear to me how to best address this issue, since tiice has two vertical layers, but Model%kice is required to be 9 for RUC LSM. Would it be possible to add in the v1 sfc file generation to chgres_cube, so that v1 sfc data files can be used for RAP and HRRR physics suites, while v2 sfc data can be used for the rest of the non-RUC LSM based physics suites?

Thank you very much for the assistance with this issue!

v1 of the surface coldstart file is being deprecated. At some point, only v2 files will be used.

MichaelLueken commented 1 week ago

v1 of the surface coldstart file is being deprecated. At some point, only v2 files will be used.

Thank you, @GeorgeGayno-NOAA! I'll try reaching out to the FV3ATM team and see if they might have a strategy to deal with tiice for RAP and HRRR.