ufs-community / regional_workflow

THIS REPOSITORY IS NOW DEPRECATED; SEE UFS SRW APP FOR CURRENT CODE
https://github.com/ufs-community/ufs-srweather-app
Other
10 stars 87 forks source link

make_lbcs task fails because python has been unloaded #556

Closed evankalina closed 1 year ago

evankalina commented 3 years ago

Description

The make_lbcs task fails shortly after chgres_cube finishes running on the first LBC file. The error is: "/usr/bin/env: python3: No such file or directory."

I believe the failure occurs because of this line in the exregional_make_lbcs.sh script.

The python module is being unloaded before chgres_cube runs, but this is happening in a for loop. During the next pass through the loop, the set_namelist.py script needs to run again, but the python module has not been reloaded.

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

  1. Set FCST_LEN_HRS > LBC_SPEC_INTVL_HRS
  2. Run through the make_lbcs task
  3. The make_lbcs task should fail

Additional Context

I am using intel to build the model and its components on Orion. I believe PR #526 introduces this bug.

Output

From the make_lbcs log:

...
 - DONE.
 - DONE.
 - DONE.
/usr/bin/env: python3: No such file or directory

ERROR:
  From script:  "exregional_make_lbcs.sh"
  Full path to script:  "/work/noaa/ufs-phys/ekalina/ufssrw_20210721/regional_workflow/scripts/exregional_make_lbcs.sh"
Call to python script set_namelist.py to set the variables in the namelist
file read in by the chgres_cube executable failed.  Parameters passed to
this script are:
  Name of output namelist file:
    nml_fn = "fort.41"
  Namelist settings specified on command line (these have highest precedence):
    settings =

'config': {
 'fix_dir_input_grid': /work/noaa/global/glopara/fix/fix_am,
 'fix_dir_target_grid': /work/noaa/ufs-phys/ekalina/ufssrw_20210721/expt_dirs/RRFS_CONUS_13km_GFS_v16/fix_lam,
 'mosaic_file_target_grid': /work/noaa/ufs-phys/ekalina/ufssrw_20210721/expt_dirs/RRFS_CONUS_13km_GFS_v16/fix_lam/C775_mosaic.halo4.nc,
 'orog_dir_target_grid': /work/noaa/ufs-phys/ekalina/ufssrw_20210721/expt_dirs/RRFS_CONUS_13km_GFS_v16/fix_lam,
 'orog_files_target_grid': C775_oro_data.tile7.halo4.nc,
 'vcoord_file_target_grid': /work/noaa/ufs-phys/ekalina/ufssrw_20210721/expt_dirs/RRFS_CONUS_13km_GFS_v16/fix_am/global_hyblev.l65.txt,
 'varmap_file': /work/noaa/ufs-phys/ekalina/ufssrw_20210721/src/UFS_UTILS/parm/varmap_tables/GFSphys_var_map.txt,
 'data_dir_input_grid': /work/noaa/ufs-phys/ekalina/ufssrw_20210721/expt_dirs/RRFS_CONUS_13km_GFS_v16/2019112500/FV3GFS/for_LBCS,
 'atm_files_input_grid': ,
 'grib2_file_input_grid': "gfs.t00z.pgrb2.0p25.f006",
 'cycle_mon': 11,
 'cycle_day': 25,
 'cycle_hour': 6,
 'convert_atm': True,
 'regional': 2,
 'halo_bndy': 4,
 'halo_blend': 10,
 'input_type': grib2,
 'external_model': GFS,
 'tracers_input': "",
 'tracers': "",
 'thomp_mp_climo_file': ,
}

Exiting with nonzero status.
gsketefian commented 3 years ago

@evankalina Thanks for pointing this out. We missed it because apparently it doesn't happen on Hera (I ran with this version of the code yesterday), and most of the tests for the PRs are on Hera or Cheyenne. @JulieSchramm @mkavulich @JeffBeck-NOAA We can just disable the "unload_python" line for now until Julie gets back on Monday (the unload is needed only for gnu compilers). What do you think?

JeffBeck-NOAA commented 3 years ago

@gsketefian, yes, let's do it.

evankalina commented 3 years ago

That sounds reasonable. I can confirm that commenting out the line does not seem to cause any adverse affects (at least not on Orion with an Intel build).

gsketefian commented 3 years ago

Re PR #526, I just remembered that NCO wants all module loads to happen outside the J-jobs and ex-scripts. At least that was the case a couple of years ago. That's why we created the load_modules_run_task.sh script. We should check with @JacobCarley-NOAA about whether this is still the case. If so, we may not be able to unload modules in the ex-scripts. @JeffBeck-NOAA @JulieSchramm

mkavulich commented 1 year ago

Solved by #559