ufs-community / UFS_UTILS

Utilities for the NCEP models.
Other
21 stars 108 forks source link

Add CMake build to UFS_UTILS #91

Closed GeorgeGayno-NOAA closed 4 years ago

GeorgeGayno-NOAA commented 4 years ago

v1 of the public release used CMake to build chgres_cube. Use CMake to build all programs in the repository. Remove the old build system.

GeorgeGayno-NOAA commented 4 years ago

Added the required cmake files from the public release branch to the 'feature/cmake' branch at cb887b6. The "build_chgres_cube.sh" was updated to invoke cmake from Dell. To get the config step to work, I had to updated the Modules/FindWGRIB2.cmake (under the git@github.com:NOAA-EMC/CMakeModules.git repo) as follows:

--- a/Modules/FindWGRIB2.cmake
+++ b/Modules/FindWGRIB2.cmake
@@ -5,7 +5,7 @@ endif()

 find_path (WGRIB2_INCLUDES
   wgrib2api.mod
-  HINTS ${WGRIB2_ROOT}/include)
+  HINTS ${WGRIB2_ROOT}/include ${WGRIB2_ROOT}/lib)

When I invoke the build script, the configure step works, but the build step fails:

-- Configuring done
-- Generating done
-- Build files have been written to: /gpfs/dell2/emc/modeling/noscrub/George.Gayno/ufs_utils.git/UFS_UTILS/build
+ make
Scanning dependencies of target chgres_cube.exe
[  8%] Building Fortran object sorc/chgres_cube.fd/CMakeFiles/chgres_cube.exe.dir/program_setup.f90.o
/gpfs/dell2/emc/modeling/noscrub/George.Gayno/ufs_utils.git/UFS_UTILS/sorc/chgres_cube.fd/program_setup.f90(343): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [ESMF]
  use esmf
------^

The path to the esmf module directory is not being set. Not sure why that is.

kgerheiser commented 4 years ago

CMake finds ESMF through the environment variable ESMFMKFILE. Is that set? It throws a warning when running cmake if it's not.

GeorgeGayno-NOAA commented 4 years ago

Update the branch following Kyle's suggestion (725dd1c). The compilation moved past the previous error but now fails at the link step. The path to the netcdf library is not being defined (but it is able to find the netcdf include directory):

[100%] Linking Fortran executable chgres_cube.exe
ld: cannot find -lnetcdff
ld: cannot find -lnetcdf
make[2]: *** [sorc/chgres_cube.fd/chgres_cube.exe] Error 1
make[1]: *** [sorc/chgres_cube.fd/CMakeFiles/chgres_cube.exe.dir/all] Error 2
make: *** [all] Error 2
kgerheiser commented 4 years ago

Is that because of this? https://github.com/NOAA-EMC/CMakeModules/issues/30

GeorgeGayno-NOAA commented 4 years ago

Is that because of this? NOAA-EMC/CMakeModules#30

I am almost certain that is the problem. But I can't figure out what to do.

kgerheiser commented 4 years ago

If you run make VERBOSE=1 you can see the exact flags that are used and see what the problem is. Like, is it not adding correct -L or what. And did the CMake invocation spit out any errors for FindNetCDF?

climbfuji commented 4 years ago

Have you tried adding this block to the top-level CMakeLists.txt in UFS_UTILS before find_package(NetCDF MODULE REQUIRED) is called?

# Add environment variable NETCDF to CMAKE_PREFIX_PATH
# for PkgConfig, and set cmake variable accordingly
if(NOT NETCDF)
  if(NOT DEFINED ENV{NETCDF})
    message(FATAL_ERROR "Environment variable NETCDF not set")
  else()
    list(APPEND CMAKE_PREFIX_PATH $ENV{NETCDF})
    set(NETCDF $ENV{NETCDF})
  endif()
  if(DEFINED ENV{NETCDF_FORTRAN})
    list(APPEND CMAKE_PREFIX_PATH $ENV{NETCDF_FORTRAN})
  endif()
endif()
kgerheiser commented 4 years ago

He has that here:

https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/725dd1cf8455f02938b2677b06a0a0cb33a85fb6/CMakeLists.txt#L34

GeorgeGayno-NOAA commented 4 years ago

Have you tried adding this block to the top-level CMakeLists.txt in UFS_UTILS before find_package(NetCDF MODULE REQUIRED) is called?

Add environment variable NETCDF to CMAKE_PREFIX_PATH

for PkgConfig, and set cmake variable accordingly

if(NOT NETCDF) if(NOT DEFINED ENV{NETCDF}) message(FATAL_ERROR "Environment variable NETCDF not set") else() list(APPEND CMAKE_PREFIX_PATH $ENV{NETCDF}) set(NETCDF $ENV{NETCDF}) endif() if(DEFINED ENV{NETCDF_FORTRAN}) list(APPEND CMAKE_PREFIX_PATH $ENV{NETCDF_FORTRAN}) endif() endif()

I thought the above logic was to replace the find_package. I added that back and it now compiles. Thanks.

climbfuji commented 4 years ago

Hooray! Sorry, me instructions should have been clearer.

On Mar 17, 2020, at 12:11 PM, GeorgeGayno-NOAA notifications@github.com wrote:

Have you tried adding this block to the top-level CMakeLists.txt in UFS_UTILS before find_package(NetCDF MODULE REQUIRED) is called?

Add environment variable NETCDF to CMAKE_PREFIX_PATH

for PkgConfig, and set cmake variable accordingly

if(NOT NETCDF) if(NOT DEFINED ENV{NETCDF}) message(FATAL_ERROR "Environment variable NETCDF not set") else() list(APPEND CMAKE_PREFIX_PATH $ENV{NETCDF}) set(NETCDF $ENV{NETCDF}) endif() if(DEFINED ENV{NETCDF_FORTRAN}) list(APPEND CMAKE_PREFIX_PATH $ENV{NETCDF_FORTRAN}) endif() endif()

I thought the above logic was to replace the find_package. I added that back and it now compiles. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/UFS_UTILS/issues/91#issuecomment-600221083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RNN46XCJEQQPS2MYRDRH64OFANCNFSM4LMSJMYQ.

GeorgeGayno-NOAA commented 4 years ago

Using the branch at 24a7829, I can compile chgres_cube on Dell and Hera. Now I am having problems on Jet. It fails on the link step because the "-openmp" flag is obsolete when using intel v18. I don't have this problem on Dell and Hera, the correct flag "-qopenmp" is used. So how is the link.txt file created? And why does it contain the correct flag on Dell and Hera, but not Jet?

climbfuji commented 4 years ago

I was looking at your branch. Firstly, the compiler flags are not set in the same way as they are in the release v1 branch (UFS_UTILS/sorc/chgres_cube.fd/CMakeLists.txt).

The way @aerorahul did the rework, the magic happens in UFS_UTILS/CMakeLists.txt. If the option OPENMP is ON (default is OFF), then the following call figures out the correct flags (or at least it should):

option(OPENMP "use OpenMP threading" OFF)
...
if(OPENMP)
  find_package(OpenMP REQUIRED COMPONENTS Fortran)
endif()

Then, in UFS_UTILS/sorc/chgres_cube.fd/CMakeLists.txt:

release v1 branch

...
if(CMAKE_Fortran_COMPILER_ID MATCHES "^(Intel)$")
  set(CMAKE_Fortran_FLAGS "-g -traceback")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3")
  set(CMAKE_Fortran_FLAGS_DEBUG
      "-O0 -check -check noarg_temp_created -check nopointer -warn -warn noerrors -fp-stack-check -fstack-protector-all -fpe0 -debug -ftrapuv"
  )
elseif(CMAKE_Fortran_COMPILER_ID MATCHES "^(GNU|Clang|AppleClang)$")
  set(CMAKE_Fortran_FLAGS "-g -fbacktrace -ffree-form -ffree-line-length-0")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3")
  set(CMAKE_Fortran_FLAGS_DEBUG
      "-O0 -ggdb -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -ffpe-trap=invalid,zero,overflow -fbounds-check"
  )
endif()
...
if(OpenMP_Fortran_FOUND)
  target_link_libraries(${exe_name} OpenMP::OpenMP_Fortran)
endif()

your branch

if(CMAKE_Fortran_COMPILER_ID MATCHES "^(Intel)$")
  set(CMAKE_Fortran_FLAGS "-g -traceback")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -fp-model precise -r8 -i4 -qopenmp -convert big_endian -assume byterecl")
  set(CMAKE_Fortran_FLAGS_DEBUG
      "-O0 -check -check noarg_temp_created -check nopointer -warn -warn noerrors -fp-stack-check -fstack-protector-all -fpe0 -debug -ftrapuv"
  )
elseif(CMAKE_Fortran_COMPILER_ID MATCHES "^(GNU|Clang|AppleClang)$")
  set(CMAKE_Fortran_FLAGS "-g -fbacktrace -ffree-form -ffree-line-length-0")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3")
  set(CMAKE_Fortran_FLAGS_DEBUG
      "-O0 -ggdb -fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -ffpe-trap=invalid,zero,overflow -fbounds-check"
  )
endif()
...
if(OpenMP_Fortran_FOUND)
  target_link_libraries(${exe_name} OpenMP::OpenMP_Fortran)
endif()

Thus, the first thing I would do is to remove the hard-coded -qopenmp because this is against the logic to control and set the OpenMP flags using -DOPENMP=ON and CMake's own tools. Then I would try to compile it without turning OPENMP on, i.e. just omit the argument -DOPENMP=ON (because the default is OFF) or use -DOPENMP=OFF explicitly.

If this works, try it with -DOPENMP=ON. If need be, you can write the flags that CMake figured out to stdout using message(INFO "${...}"). See https://cmake.org/cmake/help/v3.15/module/FindOpenMP.html for the variables that the find_package(OpenMP) call is supposed to set.

GeorgeGayno-NOAA commented 4 years ago

Per @climbfuji recommendation, updated the branch to remove the hard-coded openmp flag, and to turn on cmake's OPENMP check (54428ec). On Jet, the check works and adds the correct openmp flag during compilation, but not at link time.

-- Found OpenMP_Fortran: -qopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0") found components: Fortran

I updated ./sorc/chgres_cube.fd/CMakeLists.txt to print this message. I assume that is where the link step occurs:

--- a/sorc/chgres_cube.fd/CMakeLists.txt
+++ b/sorc/chgres_cube.fd/CMakeLists.txt
@@ -42,6 +42,7 @@ target_link_libraries(
   ${WGRIB2_LIBRARIES}
   ${NETCDF_LIBRARIES})
 if(OpenMP_Fortran_FOUND)
+  message("found openmp " ${OpenMP_Fortran_FLAGS})
   target_link_libraries(${exe_name} OpenMP::OpenMP_Fortran)
 endif()

and the logic is being tripped with the correct flag:

-- Found sp: /lfs3/projects/hfv3gfs/nwprod/NCEPLIBS/lib/sp_v2.0.2/libsp_v2.0.2_d.a
found openmp -qopenmp
-- Configuring done

but it is still using -openmp at link time.

climbfuji commented 4 years ago

@aerorahul do you have any insight on what might be going on?

aerorahul commented 4 years ago

@GeorgeGayno-NOAA Can you point me to your branch and setup where you are trying to build this. I get it is on Jet. I'll try to log on to Jet and check.

aerorahul commented 4 years ago

@climbfuji @GeorgeGayno-NOAA The -openmp flag is coming from L40 in esmf.mk. Some one did not compile esmf properly, or did, but ignored the fact that openmp is not valid on jet.

aerorahul commented 4 years ago

also, you don't need -openmp in ESMFF90LINKOPTS. That flag should be added to the application that links with OpenMP. ESMF provides a library only.

climbfuji commented 4 years ago

Which ESMF version is it that you are using on Jet? I "usually" build ESMF with OPENMP=OFF.

GeorgeGayno-NOAA commented 4 years ago

also, you don't need -openmp in ESMFF90LINKOPTS. That flag should be added to the application that links with OpenMP. ESMF provides a library only.

I have never used ESMFF90LINKOPS when compiling (maybe I should have?). chgres_cube works perfectly fine without it.

GeorgeGayno-NOAA commented 4 years ago

Which ESMF version is it that you are using on Jet? I "usually" build ESMF with OPENMP=OFF.

module use /mnt/lfs3/projects/hfv3gfs/gwv/ljtjet/lib/modulefiles module load esmflocal/ESMF_8_0_0_beta_snapshot_21

aerorahul commented 4 years ago

also, you don't need -openmp in ESMFF90LINKOPTS. That flag should be added to the application that links with OpenMP. ESMF provides a library only.

I have never used ESMFF90LINKOPS when compiling (maybe I should have?). chgres_cube works perfectly fine without it.

@GeorgeGayno-NOAA L40 in chgres_cube.fd/CMakeLists.txt links with esmf. That invokes a linker to ESMF_INTERFACE_LINK_LIBRARIES. This is an imported target from FindESMF.cmake. See lines 100-103 in FindESMF.cmake

The L103 property is set in L96 of FindESMF.cmake which includes the ESMF_F90LINKOPTS. The latter is obtained from esmf.mk

aerorahul commented 4 years ago

@climbfuji @GeorgeGayno-NOAA The way I see it, there are 3 options.

  1. Rebuild ESMF build by correctly building it without OpenMP or
  2. Rebuild ESMF with appropriate OpenMP flags for Jet or
  3. Manually go into esmf.mk and remove the references to -openmp, which are wrong. This is cheating because we don't know exactly who built this and how. Do we know who maintains the esmf module?
    module use /mnt/lfs3/projects/hfv3gfs/gwv/ljtjet/lib/modulefiles
    module load esmflocal/ESMF_8_0_0_beta_snapshot_21
climbfuji commented 4 years ago

Do you want to try using ESMF from the UFS public release?

On jet:

module purge module load intel/18.0.5.274 module load impi/2018.4.274 module load netcdf/4.7.0 module use -a /lfs3/projects/hfv3gfs/GMTB/modulefiles/generic module load cmake/3.16.4

module use -a /lfs3/projects/hfv3gfs/GMTB/modulefiles/intel-18.0.5.274/impi-2018.4.274 module load NCEPlibs/1.0.0

this sets ESMFMKFILE and other variables, see the contents of /lfs3/projects/hfv3gfs/GMTB/modulefiles/intel-18.0.5.274/impi-2018.4.274/NCEPlibs/1.0.0

if this is not what you want, try tp skip the line "module load NCEPlibs/1.0.0" and just set ESMFMKFILE to match what the module would have done

This ESMF version also has OpenMP enabled (seems to be the default these days, annoying), but I checked the esmf.mk file and it has the correct -qopenmp flags everywhere.

8.0.0bs21 would be pretty old anyway, the NCEPLIBS version is 8.0.0

Dom

On Mar 19, 2020, at 7:23 AM, Rahul Mahajan notifications@github.com wrote:

also, you don't need -openmp in ESMFF90LINKOPTS. That flag should be added to the application that links with OpenMP. ESMF provides a library only.

I have never used ESMFF90LINKOPS when compiling (maybe I should have?). chgres_cube works perfectly fine without it.

@GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA L40 in chgres_cube.fd/CMakeLists.txt https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/cmake/sorc/chgres_cube.fd/CMakeLists.txt links with esmf. That invokes a linker to ESMF_INTERFACE_LINK_LIBRARIES. This is an imported target from FindESMF.cmake. See lines 100-103 in FindESMF.cmake https://github.com/NOAA-EMC/CMakeModules/blob/b605db0df4f8fb1110aaba989dbffad5312752b9/Modules/FindESMF.cmake The L103 property is set in L96 of FindESMF.cmake which includes the ESMF_F90LINKOPTS. The latter is obtained from esmf.mk

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/UFS_UTILS/issues/91#issuecomment-601175226, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RNU7YSMAJP4XEZOVYTRIIMGBANCNFSM4LMSJMYQ.

climbfuji commented 4 years ago

None of us. DTC also has a version 8.0.0 separate from the UFS public release, which we use for regression testing for EMC's develop branches of the UFS:

module purge

module load intel/18.0.5.274 module load impi/2018.4.274 module load hdf5/1.10.4 module load netcdf/4.6.1

module use -a /lfs3/projects/hfv3gfs/GMTB/modulefiles/intel-18.0.5.274

NCEP libraries (download and build manually for the time being; https://github.com/NCAR/NCEPlibs)

module load NCEPlibs/9.9.9

use pre-compiled EMSF library for above compiler / MPI combination

module load esmf/8.0.0

On Mar 19, 2020, at 7:28 AM, Rahul Mahajan notifications@github.com wrote:

@climbfuji https://github.com/climbfuji @GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA The way I see it, there are 3 options.

Rebuild ESMF build by correctly building it without OpenMP or Rebuild ESMF with appropriate OpenMP flags for Jet or Manually go into esmf.mk and remove the references to -openmp, which are wrong. This is cheating because we don't know exactly who built this and how. Do we know who maintains the esmf module? module use /mnt/lfs3/projects/hfv3gfs/gwv/ljtjet/lib/modulefiles module load esmflocal/ESMF_8_0_0_beta_snapshot_21 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/UFS_UTILS/issues/91#issuecomment-601177631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RKQ65HRBQSNLEEZA6DRIIMWXANCNFSM4LMSJMYQ.

GeorgeGayno-NOAA commented 4 years ago

None of us. DTC also has a version 8.0.0 separate from the UFS public release, which we use for regression testing for EMC's develop branches of the UFS:

module purge

module load intel/18.0.5.274 module load impi/2018.4.274 module load hdf5/1.10.4 module load netcdf/4.6.1

module use -a /lfs3/projects/hfv3gfs/GMTB/modulefiles/intel-18.0.5.274

NCEP libraries (download and build manually for the time being; https://github.com/NCAR/NCEPlibs)

module load NCEPlibs/9.9.9

use pre-compiled EMSF library for above compiler / MPI combination

module load esmf/8.0.0 …

I tried your esmf v8. chgres compiles and all regression tests pass on Jet. Thanks. Will your version of esmf be available for the foreseeable future?

climbfuji commented 4 years ago

I can commit that it will be available until we have another standalone ESMF build (that doesn't rely on the NCEPLIBS-external build system) available on Jet. Whether this will be provided by DTC or EMC is something we haven't ironed out yet. It will depend on which party is deemed responsible for maintaining the UFS portability on jet. In the last months, there was an idea that DTC would be doing this.

GeorgeGayno-NOAA commented 4 years ago

Status of the branch at 0bf4a92c: It compiles on Jet, Hera and Dell. The regression tests run and pass on Jet. On Hera and Dell, the tests fail with this odd error:

 - OPEN TARGET GRID MOSAIC FILE:
 /scratch1/NCEPDEV/da/George.Gayno/noscrub/reg_tests/chgres_cube/fix/C96/C96_mosaic.nc

 FATAL ERROR: opening grid mosaic file: NetCDF: Unknown file format

If I change the order of the esmf and netcdf libraries at link time:

--- a/sorc/chgres_cube.fd/CMakeLists.txt
+++ b/sorc/chgres_cube.fd/CMakeLists.txt
@@ -37,10 +37,10 @@ target_link_libraries(
   bacio_4
   sp_d
   w3nco_d
+  ${NETCDF_LIBRARIES}
   esmf
   MPI::MPI_Fortran
-  ${WGRIB2_LIBRARIES}
-  ${NETCDF_LIBRARIES})
+  ${WGRIB2_LIBRARIES})
 if(OpenMP_Fortran_FOUND)

the regression tests pass on Hera and Dell. The tests also run successfully on Jet. Will consult with the cmake experts before making this change.

climbfuji commented 4 years ago

I know what is going on. The wgrib2 installation/module you are using has netCDF support built in. The default for wgrib2 is to build its own netCDF libraries based on the old 3.x version. It can't read NetCDF4 files. We worked around this by turning off NETCDF support in wgrib2 for the UFS release.

What we need to do in the future is to work with the wgrib2 developer to have an option to provide NetCDF and its dependencies and skip building it. Otherwise you'll always get conflicts.

You can use the UFS release versions of wgrib2 if you like. I can send instructions.

aerorahul commented 4 years ago

@climbfuji I was just going to say something similar. I would have asked to see the link-time log. It appears that the library interface is found, but when it actually tries to open it, an error is thrown.

The target_link_libraries have a transitive property, in the sense that when you include something like esmf, all the libraries that were linked with esmf are passed along. So you could even omit ${NETCDF_LIBRARIES} at link time. We include it because that doesn't hurt and it gives the developer an idea of what they are using.

climbfuji commented 4 years ago

The immediate issue is that it depends on which libnetcdf appears first or last in the link list - whether it takes the netcdf3 or 4 version. But the underlying issue is that wgrib2 builds its own library, which it shouldn't. At least one should be able to say, "hey, wgrib2, I want you to enable netcdf support and please use those libraries" - this is how all other software works.

GeorgeGayno-NOAA commented 4 years ago

Compiled my own copies of wgrib2 on Hera and Dell without netcdf. It was easy to do. Just set "USE_NETCDF3=0" and "USE_NETCDF4=0".

The branch at 9299192 now compiles and runs all regression tests successfully on Hera and Dell.

GeorgeGayno-NOAA commented 4 years ago

Begin updates for building on Cray. The most recent version of cmake on Cray is v3.6, which is too old. So with Rahul's help, I compiled my own copy of cmake v3.16.5. The machine-setup.sh file was updated to point to it. The module build file was also updated. All updates were baselined at 35a989b.

Unfortunately, there were some glitches. The find of the netcdf library ( FindNetCDF.cmake ) and MPI ( find_package(MPI) ) failed. When I tried Rahul's updated version of FindNetCDF.cmake, it worked. Can we create a branch with his version for further testing?

GeorgeGayno-NOAA commented 4 years ago

Added Kyle's suggestion (https://github.com/GeorgeGayno-NOAA/UFS_UTILS/issues/8) at f97a688.

GeorgeGayno-NOAA commented 4 years ago

All reviewer comments have been addressed. See https://github.com/NOAA-EMC/UFS_UTILS/pull/101 for more details.

Next, compile and then rerun all regression tests on Jet, Hera, Cray and Dell.

GeorgeGayno-NOAA commented 4 years ago

First, ensure the latest version of the branch (608e52e) compiles on all officially supported machines. This was confirmed on Orion, Hera, Jet, WCOSS-Cray and WCOSS-Dell for the Intel compiler (both Release" and "Debug" build types). On Hera, it was successfully compiled using the GNU compiler (both Release" and "Debug" build types). Additionally, Dusan was able to build it on his Linux workstation.

Next, the regression tests will be run.

GeorgeGayno-NOAA commented 4 years ago

The grid generation driver scripts (under ./driver_scripts) were modified to remove individual module loads. These (608e52e) were rerun on Jet, Hera, WCOSS-Cray/Dell and confirmed to be working.

The GDAS initialization scripts (under ./util/gdas_init) were also modified to remove module loads. They were rerun on Hera, WCOSS-Cray/Dell (there is no Jet version) and confirmed to be working

GeorgeGayno-NOAA commented 4 years ago

The nemsio utilities (nemsio_read, nemsio_get, nemsio_chgdate, mkgfsnemsioctl) do not have regression tests, but I tested them using a canned case:

All tests worked correctly using 608e52e.

GeorgeGayno-NOAA commented 4 years ago

The global_chgres program is nearly obsolete, but I ran a quick test using these canned cases:

All tests worked correctly using the branch at 608e52e.

GeorgeGayno-NOAA commented 4 years ago

The chgres_cube regression test suite (./reg_tests/chgres_cube) and ice_blend test (./reg_tests/ice_blend) were run on Jet, Hera, Venus and Surge using 608e52e. All tests passed.

GeorgeGayno-NOAA commented 4 years ago

The global_cycle regression test suite (./reg_tests/global_cycle) was run on Jet, Hera, Venus and Surge. The test passed on Jet, Hera and Venus. It failed on Surge. (used 608e52e)

The test compares the output tiled netcdf files to a baseline set using the nccmp utility. Two of the surface files (tile3 and tile6) had 'floating point' differences. Example:

Variable Group Count          Sum      AbsSum          Min          Max       Range         Mean      StdDev
tsea     /         8 -3.41061e-13 4.54747e-13 -5.68434e-14  5.68434e-14 1.13687e-13 -4.26326e-14 4.01944e-14
tisfc    /         3  -1.7053e-13  1.7053e-13 -5.68434e-14 -5.68434e-14           0 -5.68434e-14           0
tref     /        17  -1.7053e-13 9.66338e-13 -5.68434e-14  5.68434e-14 1.13687e-13 -1.00312e-14 5.76733e-14
tfinc    /         3  -1.7053e-13  1.7053e-13 -5.68434e-14 -5.68434e-14           0 -5.68434e-14           0
stc      /        12 -6.82121e-13 6.82121e-13 -5.68434e-14 -5.68434e-14           0 -5.68434e-14           0

These differences are insignificant. global_cycle is working correctly. The threshold used by nccmp should be increased.

GeorgeGayno-NOAA commented 4 years ago

The snow2mdl regression test (./reg_tests/snow2mdl) was run on Jet, Hera, Venus and Surge (using 608e52e). The test runs snow2mdl to create a T1534 snow cover and snow depth analysis. The output is compared to a baseline file. The test passed on Jet, Hera and Venus. It failed on Surge.

On Surge, the output from the branch was compared to the baseline file in Grads. The snow cover records were identical. Six snow depth points near -73/Greenwich differed by 0.0001 meters. Differences are likely due to using compiling the branch with "O3" vs "O0" for 'develop' But I can not explain why this difference happened only on the Cray. Anyway, the output, while different, is correct.

GeorgeGayno-NOAA commented 4 years ago

The last regression test is for the grid generation codes (./reg_tests/grid_gen) (used 608e52e). The test creates a C96 global uniform grid and C96 regional grid. The 'grid', 'oro' and surface climo files are compared to a baseline set of data using the nccmp command. On Jet, Hera, Surge and Vensu, the global test passed, but the regional test failed.

The 'grid' files had differences in the grid box area record. But these are small compared to their magnitude (~10**7). Here is the difference (Venus) for the C96_grid.tile7.halo4.NC file:

Variable Group Count         Sum    AbsSum         Min       Max     Range        Mean    StdDev
area     /         9 -0.00901271 0.0991399 -0.00901273 0.0180254 0.0270382 -0.00100141 0.0122954

The differences on other machines were similar. So this is not a problem.

The other differences were noted in the filtered orography record. Here is the difference (Venus) for the C96_oro_data.tile7.halo4.nc file:

Variable  Group Count     Sum  AbsSum      Min     Max   Range      Mean   StdDev
orog_filt /       410 28.7892 138.521 -3.92627 10.1989 14.1251 0.0702175 0.985272

These differences are large enough to be concerning. Examining the differences in ncview, they were confined to the few rows along the lateral boundaries. The interior had no differences. Wanting to know what could be happening, I started to add some print statements. I then discovered an array dimension problem in filter_topo.F90. The orography was incorrectly dimensioned with a rank of 4 instead 3 in routine FV3_zs_filter . When I made the correction:

real, intent(IN):: stretch_fac
     logical, intent(IN) :: nested, regional
-    real, intent(inout):: phis(isd:ied,jsd,jed,ntiles)
+    real, intent(inout):: phis(isd:ied,jsd:jed,ntiles)
     real:: cd2
     integer mdim, n_del2, n_del4

I got this difference from the baseline file:

Variable  Group Count     Sum  AbsSum     Min     Max   Range     Mean  StdDev
orog_filt /       532 96.1873 280.998 -5.0249 25.0226 30.0475 0.180803 1.98246

That is very different. So I suspect there is a bug in the filter_topo code. And the test failure is unrelated to the CMake build. I plan to merge, but I will open an issue to look into any problems with the filter_topo program

GeorgeGayno-NOAA commented 4 years ago

I am seeing similar behavior with filter_topo in develop (270f9dcf). If I fix the rank of phis and print out the value of phis(1,1), I get this difference in file C96_oro_data.tile7.halo4.nc:

Variable  Group Count     Sum  AbsSum      Min     Max   Range     Mean  StdDev
orog_filt /       396 103.033 318.697 -25.4385 26.2191 51.6576 0.260184 2.97099

If I add another print statement for phis(1,2), I get this difference:

Variable  Group Count       Sum AbsSum      Min     Max   Range        Mean   StdDev
orog_filt /       370 -0.375389 68.313 -2.69452 3.75537 6.44989 -0.00101457 0.484862

So there is likely some kind of memory leak going on with the 'regional' option.

So, the CMake build is not the culprit. Will merge the feature/cmake branch to develop.

GeorgeGayno-NOAA commented 4 years ago

Merged to 'develop' at 3ad7d83. Closing issue.