ufs-community / UFS_UTILS

Utilities for the NCEP models.
Other
21 stars 104 forks source link

Code and script to create required fix files for coupled model needs to be incorporated into ufs-utils. #143

Closed DeniseWorthen closed 2 years ago

DeniseWorthen commented 4 years ago

Updated Issue and Action Plan is documented here

Original issue description:

The following tools need to be integrated into ufs_utils:

1) CICE5_ICgen: initial condition generation for cice5/cice6

2) CICE5_gridgen: grid condition generation for cice5/cice6

3) WeightGen: weight generation for grid->grid transformations containing

a) gen_fixgrid and associated fortran code: a fortran routine to read the MOM6 supergrid and create a netcdf file which is the basis for the remaining transformations b) generate_iceocn_weights.ncl: esmf weight generation for tripole:tripole or tripole:rectilinear or other grid:grid transformations c) generate_icemesh.ncl: generation of the ice_mesh for use by the CMEPS mediator d) generate_frac_land_weights.sh: generation of the frac_land weights (the ocean mask on the fv3 grid) using ESMF_RegridWeightGen e) make_frac_land.ncl: creation of the 6-tiled land_frac files

For more details, see Jun's notes: https://docs.google.com/document/d/13ui4P_vE2ejVn1fsMnRdoHgjaUf8d5u_-_S4nOvcWOA/edit?usp=sharing

This work requires the use of ESMF 8.2.0. That upgrade will be done under this issue:

GeorgeGayno-NOAA commented 4 years ago

@DeniseWorthen The first step will be to add these codes to the ./sorc directory and build them. We build using CMake.

Also I recommend we add some regression or unit tests for these codes.

Are these codes run in OPS? If so, are there any 'ush' scripts that will need to be included?

DeniseWorthen commented 4 years ago

These codes are all associated with the generation of fixed or initial conditions for the ufs-s2s-model.

The product of the weight generation routines (ie, the regridding weights) are also used in post, to regrid the tripole grid to rectilinear grids at different resolutions.

There are only two fortran routines included, one for the cice5/6 gridgen and one for the MOM6 gridgen. I do not know CMake.

The remaining contents are ncl scripts.

I need to understand how general to make these routines. For example, since they're my tools, they point to my directories. They should, if being used to create curated files, point to some baseline area for both input and output. What would a regression test entail? Currently it is a manual process to choose the resolution, create the needed grid files etc. Am I supposed to make this launch as some sort of automatic regression test end-to-end?

GeorgeGayno-NOAA commented 4 years ago

These codes are all associated with the generation of fixed or initial conditions for the ufs-s2s-model.

The product of the weight generation routines (ie, the regridding weights) are also used in post, to regrid the tripole grid to rectilinear grids at different resolutions.

There are only two fortran routines included, one for the cice5/6 gridgen and one for the MOM6 gridgen. I do not know CMake.

The remaining contents are ncl scripts.

I need to understand how general to make these routines. For example, since they're my tools, they point to my directories. They should, if being used to create curated files, point to some baseline area for both input and output. What would a regression test entail? Currently it is a manual process to choose the resolution, create the needed grid files etc. Am I supposed to make this launch as some sort of automatic regression test end-to-end?

They need to be general so that anyone can use them easily. That is the purpose of UFS_UTILS. So pointing to personal directories is not allowed. Maybe we should set up a google meet to discuss a plan forward.

DeniseWorthen commented 4 years ago

That would be very useful. I've shared my calendar w/ you so lets find a time that works.

Thanks.

edwardhartnett commented 3 years ago

Is this issue still active?

DeniseWorthen commented 3 years ago

Yes, in the sense that the tools still live in my personal repos. I've been updating them all for more general use and have documented how someone would use them on the ufs-weather wiki page. But exactly how and in what form they should be added to ufs-utils has not really been resolved.

edwardhartnett commented 3 years ago

OK, to clarify how it would happen:

Let us know when you have determined which tools you would like to promote to UFS_UTILS.

DeniseWorthen commented 3 years ago

At this point, there is a single Fortran code that is used. The remainder of the tools are either NCL scripts or shell scripts.

The real stumbling block for me has been the automated aspect of it. Right now the tools require changes in multiple places to generate files for any individual resolution. The changes are for things like source and output directories or naming strings which indicate the resolution. There is also a cascade of "jobs" but the tools in general are not set up to loop over the various resolutions automatically when controlled by some top-level job script--the workflow aspect of it.

DeniseWorthen commented 3 years ago

As a first step, I have created a branch of the grid generation tool that uses a generated namelist and compiles and runs for the 1deg resolution. The branch is here: ufsutils. It isn't using CMake and I'm not sure how what else might be required for Doxygen.

To run it, edit the OUTDIR_PATH in run_test.sh to a directory you have write permission for. It should generate the namelist (grid.nml), compile and run the code. The expected output are 3 files called tripole.mx100.nc,grid_cice_NEMS_mx100.nc, and kmtu_cice_NEMS_mx100.nc in specified output directory.

The tripole.mx100.nc file is used to generate regrid weights in WeightGen that are used for IC interpolation, Post and generating the mapped ocean mask for the coupled file. The two cice files are used at run-time to specify the CICE6 grid and land mask for the coupled model.

The additional configurations can be run by adding additional information in the run_test.sh (I've commented out the generation of the 1/4 deg grid).

Let me know what the next steps might be.

GeorgeGayno-NOAA commented 2 years ago

@DeniseWorthen What is this issue? Can we close it? Or rename it?

DeniseWorthen commented 2 years ago

@GeorgeGayno-NOAA This was the original issue I created that was meant to bring the required functionality to ufs-utils. As you can see, it is more than a year old. I can edit the title since at this point, what needs to be added is a single fortran code and associated script.

GeorgeGayno-NOAA commented 2 years ago

@DeniseWorthen I would link to Jun's notes (https://docs.google.com/document/d/13ui4P_vE2ejVn1fsMnRdoHgjaUf8d5u_-_S4nOvcWOA/edit?usp=sharing) in the description section at the top:

MinsukJi-NOAA commented 2 years ago

@GeorgeGayno-NOAA In order to successfully compile, ESMF 8.2.0 had to be used instead of ESMF 8.1.0: https://github.com/MinsukJi-NOAA/UFS_UTILS/blob/feature/cpld_gridgen2/modulefiles/build.hera.intel#L32. Do you think this will be an issue for other utilities?

DeniseWorthen commented 2 years ago

The newer esmf is required because of a simple fix Gerhard made to access ESMF_RegridWeightGen via use ESMF. Without the fix, a use ESMF_RegridWeightGenMod is required in the main routine.

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA In order to successfully compile, ESMF 8.2.0 had to be used instead of ESMF 8.1.0: https://github.com/MinsukJi-NOAA/UFS_UTILS/blob/feature/cpld_gridgen2/modulefiles/build.hera.intel#L32. Do you think this will be an issue for other utilities?

Don't know. Our unit and consistency tests will tell us of any problems.

MinsukJi-NOAA commented 2 years ago

@GeorgeGayno-NOAA The following are the results of consistency tests on Hera with ESMF 8.2.0: chgres_cube

[Minsuk.Ji@hfe09 chgres_cube]$ cat summary.log 
consistency.log01:<<< C96 FV3 RESTART TEST FAILED. >>>
consistency.log02:<<< C192 FV3 HISTORY TEST FAILED. >>>
consistency.log03:<<< C96 FV3 GAUSSIAN NEMSIO TEST FAILED. >>>
consistency.log04:<<< C96 GFS SIGIO TEST FAILED. >>>
consistency.log05:<<< C96 GFS GAUSSIAN NEMSIO TEST FAILED. >>>
consistency.log06:<<< C96 REGIONAL TEST PASSED. >>>
consistency.log07:<<< C96 FV3 GAUSSIAN NETCDF TEST FAILED. >>>
consistency.log08:<<< C192 GFS GRIB2 TEST FAILED. >>>
consistency.log09:<<< 25-KM CONUS GFS GRIB2 TEST PASSED. >>>
consistency.log10:<<< 3-km CONUS HRRR W/ GFS PHYSICS GRIB2 TEST FAILED. >>>
consistency.log11:<<< 3-km CONUS HRRR W/ GSD PHYSICS AND SFC FROM FILE GRIB2 TEST FAILED. >>>
consistency.log12:<<< 13-KM CONUS NAM GRIB2 TEST PASSED. >>>
consistency.log13:<<< 13-km CONUS RAP W/ GSD PHYSICS AND SFC FROM FILE GRIB2 TEST PASSED. >>>
consistency.log14:<<< 13-KM NA GFS NCEI GRIB2 TEST PASSED. >>>
consistency.log15:<<< C96 FV3 GAUSSIAN NETCDF2WAM TEST FAILED. >>>
consistency.log16:<<< 25-KM CONUS GFS PGRIB2+BGRIB2 TEST PASSED. >>>

grid_gen

cat summary.log
<<< C96 UNIFORM TEST PASSED. >>>
<<< C96 VIIRS VEGT TEST PASSED. >>>
<<< GFDL REGIONAL TEST PASSED. >>>
<<< ESG REGIONAL TEST PASSED. >>>
<<< REGIONAL GSL GWD TEST PASSED. >>>
  1. How do we go about creating new baselines? Is creating new baselines sufficient to switch to ESMF 8.2.0?
  2. How can the unit tests be run?
GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA The following are the results of consistency tests on Hera with ESMF 8.2.0: chgres_cube

[Minsuk.Ji@hfe09 chgres_cube]$ cat summary.log 
consistency.log01:<<< C96 FV3 RESTART TEST FAILED. >>>
consistency.log02:<<< C192 FV3 HISTORY TEST FAILED. >>>
consistency.log03:<<< C96 FV3 GAUSSIAN NEMSIO TEST FAILED. >>>
consistency.log04:<<< C96 GFS SIGIO TEST FAILED. >>>
consistency.log05:<<< C96 GFS GAUSSIAN NEMSIO TEST FAILED. >>>
consistency.log06:<<< C96 REGIONAL TEST PASSED. >>>
consistency.log07:<<< C96 FV3 GAUSSIAN NETCDF TEST FAILED. >>>
consistency.log08:<<< C192 GFS GRIB2 TEST FAILED. >>>
consistency.log09:<<< 25-KM CONUS GFS GRIB2 TEST PASSED. >>>
consistency.log10:<<< 3-km CONUS HRRR W/ GFS PHYSICS GRIB2 TEST FAILED. >>>
consistency.log11:<<< 3-km CONUS HRRR W/ GSD PHYSICS AND SFC FROM FILE GRIB2 TEST FAILED. >>>
consistency.log12:<<< 13-KM CONUS NAM GRIB2 TEST PASSED. >>>
consistency.log13:<<< 13-km CONUS RAP W/ GSD PHYSICS AND SFC FROM FILE GRIB2 TEST PASSED. >>>
consistency.log14:<<< 13-KM NA GFS NCEI GRIB2 TEST PASSED. >>>
consistency.log15:<<< C96 FV3 GAUSSIAN NETCDF2WAM TEST FAILED. >>>
consistency.log16:<<< 25-KM CONUS GFS PGRIB2+BGRIB2 TEST PASSED. >>>

grid_gen

cat summary.log
<<< C96 UNIFORM TEST PASSED. >>>
<<< C96 VIIRS VEGT TEST PASSED. >>>
<<< GFDL REGIONAL TEST PASSED. >>>
<<< ESG REGIONAL TEST PASSED. >>>
<<< REGIONAL GSL GWD TEST PASSED. >>>
  1. How do we go about creating new baselines? Is creating new baselines sufficient to switch to ESMF 8.2.0?
  2. How can the unit tests be run?

I can create the new baselines.

DeniseWorthen commented 2 years ago

George, my doxygen build branch fails the debug build action, saying that:

/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/sorc/cpld_gridgen.fd/topoedits.F90:159: error: argument 'fsrc' from the argument list of topoedits::apply_topoedits has multiple @param documentation sections 

This module contains two SRs, each of which has an input file name fsrc. How would I fix that?

GeorgeGayno-NOAA commented 2 years ago

George, my doxygen build branch fails the debug build action, saying that:

/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/sorc/cpld_gridgen.fd/topoedits.F90:159: error: argument 'fsrc' from the argument list of topoedits::apply_topoedits has multiple @param documentation sections 

This module contains two SRs, each of which has an input file name fsrc. How would I fix that?

@DeniseWorthen This situation occurs in many of our chgres_cube modules. So I don't know why you are having problems. I am unable to reproduce the problem on Hera or WCOSS-Dell, unfortunately. One thing I noticed - your doxygen statements for routine apply_topoedits are placed after the subroutine declaration statement. Move them before that statement and see if that helps.

DeniseWorthen commented 2 years ago

Thanks @GeorgeGayno-NOAA. I'll try that. I'll fix the same issue in other SRs too (first doxygen statements, then subroutine statement).

Another question---do I really need to document each SR w/ me as the author? I followed the chgres example but in that case maybe you do have other authors?

GeorgeGayno-NOAA commented 2 years ago

Thanks @GeorgeGayno-NOAA. I'll try that. I'll fix the same issue in other SRs too (first doxygen statements, then subroutine statement).

Another question---do I really need to document each SR w/ me as the author? I followed the chgres example but in that case maybe you do have other authors?

We want an author for each routine. If you don't know the author, use a point-of-contact as the author. If there are multiple authors, name all of them.

DeniseWorthen commented 2 years ago

Your fix for placing the doxygen block before the subroutine statement worked. I'm now able to build the docs and leave the WARN_AS_ERROR set YES. I'll bring the doxygenated code back to the main branch I'm working from and then start thinking about the unit test.

GeorgeGayno-NOAA commented 2 years ago

Your fix for placing the doxygen block before the subroutine statement worked. I'm now able to build the docs and leave the WARN_AS_ERROR set YES. I'll bring the doxygenated code back to the main branch I'm working from and then start thinking about the unit test.

Great. That was just a guess.

DeniseWorthen commented 2 years ago

@GeorgeGayno-NOAA I've added a unit test for cpld_gridgen. It checks that the values are correctly aligned across the tripole seam.

(edit---spoke too soon; the github action failed. Sigh).

DeniseWorthen commented 2 years ago

I see this message in the workflow runs. What should this file contain?


AddressSanitizer: failed to read suppressions file '/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/cpld_gridgen/LSanSuppress.supp'```
GeorgeGayno-NOAA commented 2 years ago

I see this message in the workflow runs. What should this file contain?

AddressSanitizer: failed to read suppressions file '/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/cpld_gridgen/LSanSuppress.supp'```

Our tests check for memory leaks. Sometimes these leaks are in one of the external libraries used by your test. The file contains a list of the leaky libraries. I see your utility uses ESMF. That is one of the leaky libraries. So you can create a LSanSuppress.supp file using this example - https://github.com/ufs-community/UFS_UTILS/blob/develop/tests/global_cycle/LSanSuppress.supp

DeniseWorthen commented 2 years ago

My github workflow with the unit test is now passing. I've got a couple of ideas for other unit tests which I could work on while Minsuk is away.

DeniseWorthen commented 2 years ago

@GeorgeGayno-NOAA After I did some additional testing w/ my self-made MOM6 supergrid, I think I will need to host the actual MOM6 1-deg file to use for the unit-testing. Even if I use the make_hgrid tool to create the supergrid, the corner angles have small anomalies at the pole. The 1-deg MOM6 supergrid is 22M. A copy is located here:

/scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20211210/MOM6_FIX/100/ocean_hgrid.nc

Does it need to be moved to a specific location in order for my modified unit-test to be able to use it?

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA After I did some additional testing w/ my self-made MOM6 supergrid, I think I will need to host the actual MOM6 1-deg file to use for the unit-testing. Even if I use the make_hgrid tool to create the supergrid, the corner angles have small anomalies at the pole. The 1-deg MOM6 supergrid is 22M. A copy is located here:

/scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20211210/MOM6_FIX/100/ocean_hgrid.nc

Does it need to be moved to a specific location in order for my modified unit-test to be able to use it?

We are hosting all the unit test data here: https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/ And we can create sub-directory for your unit test. What should we name this sub-directory?

DeniseWorthen commented 2 years ago

I think we can name the subdirectory cpld_gengrid. Thanks.

GeorgeGayno-NOAA commented 2 years ago

@KateFriedman-NOAA When you get a chance, can you create a new sub-directory on the ftp site called cpld_gengrid and host her ocean_hgrid.nc file there? Thanks.

KateFriedman-NOAA commented 2 years ago

@GeorgeGayno-NOAA @DeniseWorthen Done:

https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gengrid/ https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gengrid/ocean_hgrid.nc

DeniseWorthen commented 2 years ago

Thanks. I will work on getting the unit test to use this file.

DeniseWorthen commented 2 years ago

@GeorgeGayno-NOAA I made an update to use the netcdf file. I'm getting a github action failure when I pushed the branch back. The

[ 53%] Linking Fortran executable ftst_find_angq
/usr/bin/ld: /home/runner/netcdf/lib/libhdf5.a(H5PLint.o): undefined reference to symbol 'dlclose@@GLIBC_2.2.5'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libdl.so: error adding symbols: DSO missing from command line
GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA I made an update to use the netcdf file. I'm getting a github action failure when I pushed the branch back. The

[ 53%] Linking Fortran executable ftst_find_angq
/usr/bin/ld: /home/runner/netcdf/lib/libhdf5.a(H5PLint.o): undefined reference to symbol 'dlclose@@GLIBC_2.2.5'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libdl.so: error adding symbols: DSO missing from command line

I will take a look.

GeorgeGayno-NOAA commented 2 years ago

@DeniseWorthen Try adding "NetCDF::NetCDF_Fortran" to target_link_libraries - https://github.com/DeniseWorthen/UFS_UTILS/blob/feature/addunit_ncfile/sorc/cpld_gridgen.fd/CMakeLists.txt

DeniseWorthen commented 2 years ago

@GeorgeGayno-NOAA That seemed to resolve the issue, thanks. It took me a while, but I figured out that I also needed to add the wget command to the workflows. When it still failed, I realized it was because the utility is cpld_gridgen but the files were staged under cpld_gengrid. Is it easy to rename the directory on the ftp site where we grab the data?

GeorgeGayno-NOAA commented 2 years ago

@KateFriedman-NOAA Can you rename:

https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gengrid/

to

https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gridgen/

KateFriedman-NOAA commented 2 years ago

@KateFriedman-NOAA Can you rename: https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gengrid/ to https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gridgen/

Done! https://ftp.emc.ncep.noaa.gov/static_files/public/UFS/ufs_utils/unit_tests/cpld_gridgen/

DeniseWorthen commented 2 years ago

Thanks @KateFriedman-NOAA, the unit test now downloads the correct file and runs to completion.

GeorgeGayno-NOAA commented 2 years ago

@MinsukJi-NOAA I looked at your branch (904ce59). I was able to run the regression tests on Hera. But they are run on the interactive node. I am not sure that is allowed.

MinsukJi-NOAA commented 2 years ago

@MinsukJi-NOAA I looked at your branch (904ce59). I was able to run the regression tests on Hera. But they are run on the interactive node. I am not sure that is allowed.

Given the simple and short nature of these tests, I decided to run them on the login node.

MinsukJi-NOAA commented 2 years ago

@MinsukJi-NOAA I looked at your branch (904ce59). I was able to run the regression tests on Hera. But they are run on the interactive node. I am not sure that is allowed.

Given the simple and short nature of these tests, I decided to run them on the login node.

I will start looking into implementing batch job submissions.

MinsukJi-NOAA commented 2 years ago

@MinsukJi-NOAA I looked at your branch (904ce59). I was able to run the regression tests on Hera. But they are run on the interactive node. I am not sure that is allowed.

@GeorgeGayno-NOAA Batch job has been implemented in 2d88836.

GeorgeGayno-NOAA commented 2 years ago

@MinsukJi-NOAA I looked at your branch (904ce59). I was able to run the regression tests on Hera. But they are run on the interactive node. I am not sure that is allowed.

@GeorgeGayno-NOAA Batch job has been implemented in 2d88836.

Thanks. It worked for me on Hera. Can you update for the other machines or do I need to do that?

MinsukJi-NOAA commented 2 years ago

@GeorgeGayno-NOAA I can update it myself for Jet and Orion, but not WCOSS 2. I will need though:

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA I can update it myself for Jet and Orion, but not WCOSS 2. I will need though:

  • baseline directories on Hera, Jet, and Orion
  • fixed file directories on Jet and Orion

I placed the baseline files on Hera under the role account directory: /scratch1/NCEPDEV/nems/role.ufsutils/ufs_utils/reg_tests/cpld_gridgen/baseline_data

Should I also place the 'fixed' files under the role account directory? For example: /scratch1/NCEPDEV/nems/role.ufsutils/ufs_utils/reg_tests/cpld_gridgen/fix_mom6

MinsukJi-NOAA commented 2 years ago

@GeorgeGayno-NOAA I updated the baseline directory on Hera to the role account directory you created. In order to be able to run the regression tests on Jet and Orion, fixed files need to be copied over first. I think it may not be a bad idea to create the fixed file directory on Hera, Jet, and Orion as you suggested. What do you think @DeniseWorthen @junwang-noaa?

junwang-noaa commented 2 years ago

It looks good to me.

GeorgeGayno-NOAA commented 2 years ago

@GeorgeGayno-NOAA I updated the baseline directory on Hera to the role account directory you created. In order to be able to run the regression tests on Jet and Orion, fixed files need to be copied over first. I think it may not be a bad idea to create the fixed file directory on Hera, Jet, and Orion as you suggested. What do you think @DeniseWorthen @junwang-noaa?

Find the baseline and 'fix' files on Hera here:

And on Orion here:

I will move them to Jet when the maintenance is over.