ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

Installation fails on Cheyenne with GNU+MPT combination #7

Closed uturuncoglu closed 4 years ago

uturuncoglu commented 4 years ago

I am trying to install NCEP LIBS with following module combination on Cheyenne but

1) ncarenv/1.3 2) gnu/9.1.0 3) mpt/2.19
4) netcdf-mpi/4.7.1
5) pnetcdf/1.11.1
6) ncarcompilers/0.5.0
7) esmf-8.0.0-ncdfio-mpt-O
8) cmake/3.14.4

it fails with following error

$ make VERBOSE=1
[  0%] Built target netcdf-fortran
[  5%] Built target NCEPLIBS-landsfcutil
[  5%] Built target hdf5
[  9%] Built target NCEPLIBS-g2
[ 10%] Performing build step for 'NCEPLIBS-nemsio'
[ 16%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o
/glade/u/apps/ch/opt/ncarcompilers/0.5.0/gnu/9.1.0/gfortran  -I/glade/u/apps/ch/opt/mpt/2.19/include -I/glade/u/apps/ch/opt/mpt/2.19/include/../lib -I/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build/include  -O2 -fconvert=big-endian -ffree-form -fbacktrace  -O3 -DNDEBUG -O3 -Jinclude   -O2 -fconvert=big-endian -ffree-form -fbacktrace  -c /glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/NCEPLIBS-nemsio/src/nemsio_module_mpi.f90 -o CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o
f951: Fatal Error: Reading module ‘mpi’ at line 1 column 2: Unexpected EOF
compilation terminated.
CMakeFiles/nemsio_v2.2.3.dir/build.make:75: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o' failed
make[5]: *** [CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o] Error 1
make[5]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build'
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/all' failed
make[4]: *** [CMakeFiles/nemsio_v2.2.3.dir/all] Error 2
make[4]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build'
Makefile:129: recipe for target 'all' failed
make[3]: *** [all] Error 2
make[3]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build'
CMakeFiles/NCEPLIBS-nemsio.dir/build.make:111: recipe for target 'NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build' failed
make[2]: *** [NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build] Error 2
make[2]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all'
CMakeFiles/Makefile2:666: recipe for target 'CMakeFiles/NCEPLIBS-nemsio.dir/all' failed
make[1]: *** [CMakeFiles/NCEPLIBS-nemsio.dir/all] Error 2
make[1]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all'
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

the commands that is used to install lib are followings

module purge
module load ncarenv/1.3
module load gnu/9.1.0
module load mpt/2.19
module load netcdf-mpi/4.7.1
module load pnetcdf/1.11.1
module load ncarcompilers/0.5.0
module load cmake
module use /glade/work/turuncu/PROGS/modulefiles/esmfpkgs/gnu/9.1.0
module load esmf-8.0.0-ncdfio-mpt-O
export ESMF_LIB=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/gnu/9.1.0/lib/libO/Linux.gfortran.64.mpt.default
export ESMF_INC=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/gnu/9.1.0/mod/modO/Linux.gfortran.64.mpt.default

git clone https://github.com/NOAA-EMC/NCEPLIBS.git NCEP_LIBS_ALL
cd NCEP_LIBS_ALL
git checkout origin/full-stack
git submodule init
git submodule sync
git submodule update --recursive
git submodule foreach git submodule init
git submodule foreach git submodule sync
git submodule foreach git submodule update

mkdir build-all
cd build-all
cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..
make -j 20
uturuncoglu commented 4 years ago

@climbfuji When i try to build NCEPLIB with external ESMF installation on Chayyene, i am getting following error.

CMake Warning at CMakeLists.txt:48 (find_package):
  By not providing "FindESMF.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "ESMF", but
  CMake did not find one.

  Could not find a package configuration file provided by "ESMF" with any of
  the following names:

    ESMFConfig.cmake
    esmf-config.cmake

  Add the installation prefix of "ESMF" to CMAKE_PREFIX_PATH or set
  "ESMF_DIR" to a directory containing one of the above files.  If "ESMF"
  provides a separate development package or SDK, be sure it has been
  installed.

In this case, there is no *.cmake file in the ESMF installation directory.

/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2

I defined following environment variables but those do not help

ESMF_LIB=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default
ESMF_LIBDIR=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default
ESMF_INC=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default
ESMF_RUNTIME_PROFILE=ON
ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY
ESMFMKFILE=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/esmf.mk
climbfuji commented 4 years ago

@uturuncoglu this is not an error but a misleading warning imo. The compilation and installation should proceed just fine. The only thing you need to set is ESMFMKFILE. And the installation needs to be compatible with the standard install on the NOAA platforms, i.e. instead of having lib/libO/Linux.intel.64.mpt.default, mod/modO/Linux.intel.64.mpt.default and bin/binO/Linux.intel.64.mpt.default, you just have bin, mod and lib sitting next to each other. This can be achieved by setting

export ESMF_INSTALL_BINDIR=bin
export ESMF_INSTALL_LIBDIR=lib
export ESMF_INSTALL_MODDIR=mod

when you compile ESMF.

uturuncoglu commented 4 years ago

@climbfuji with this configuration. i am getting error in chgres installation. it seems that it could not find the correct ESMF installation or its module files.

[ 18%] Building Fortran object sorc/chgres_cube.fd/CMakeFiles/chgres_cube.exe.dir/model_grid.F90.o
/glade/work/turuncu/UFS/NCEPLIBS_ALL.jan10/UFS_UTILS/sorc/chgres_cube.fd/model_grid.F90(58): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [ESMF]
 use esmf
-----^
/glade/work/turuncu/UFS/NCEPLIBS_ALL.jan10/UFS_UTILS/sorc/chgres_cube.fd/model_grid.F90(59): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [ESMF_LOGPUBLICMOD]
 use ESMF_LogPublicMod

That could be related with my previous concern of mod directory. In this case, if i look at CMakeFiles/UFS_UTILS.dir/build.make the ESMF_LIB is fine but ESMF_INC is wrong. The file points to /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/../mod but this directory does not exist in my installation. So, i think it expects to find mod/ directory in the same level with lib/ but this is not the case always. I think that is ESMF_INC is set, the build system need to pick them like ESMF_LIB. If there is no ESMF_INC environment variable, then it could assume some directory. What do you think? If you want to try, let me know and i could send the instructions of my failed case.

uturuncoglu commented 4 years ago

I could also try with following options

export ESMF_INSTALL_BINDIR=bin
export ESMF_INSTALL_LIBDIR=lib
export ESMF_INSTALL_MODDIR=mod

but this is not the way that we follow generally and it requires to reinstall all the ESMF modules from scratch.

---------------------------------------------------------------------------------- /glade/work/turuncu/PROGS/modulefiles -----------------------------------------------------------------------------------
   esmfpkgs/gnu/9.1.0/esmf-8.0.0-ncdfio-mpt-g           esmfpkgs/intel/18.0.5/esmf-8.0.0-ncdfio-mpt-O    (D)    esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpiuni-g
   esmfpkgs/gnu/9.1.0/esmf-8.0.0-ncdfio-mpt-O           esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpiuni-g        esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpiuni-O
   esmfpkgs/gnu/9.1.0/esmf-8.1.0b05-ncdfio-mpt-g        esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpiuni-O        esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpt-g
   esmfpkgs/gnu/9.1.0/esmf-8.1.0b05-ncdfio-mpt-O (D)    esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpt-g           esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpt-O    (D)
   esmfpkgs/intel/18.0.5/esmf-8.0.0-ncdfio-mpt-g        esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpt-O
climbfuji commented 4 years ago

@uturuncoglu we could try to build in some mechanism that searches for mod directories underneath the ESMF top-level installation directory, or make use of the other cmake variables that are set by ESMF's own FindESMF.cmake module. Do you want me to try that?

It is unfortunate that the default installation paths for ESMF have all these nested directories with compiler and library name in it instead of following the linux standard installation tree.

climbfuji commented 4 years ago

Add-on: the only variable that is and needs to be set is ESMFMKFILE. This file is picked up and searched by ESMF's FindESMF.cmake macro. No other variable should be set or searched, because ESMFMKFILE is the standard recommended by the ESMF team and is also what is used by the UFS weather model.

uturuncoglu commented 4 years ago

If i look at the ESMFMKFILE, the ESMF_F90COMPILEPATHS variables shows correct path of ESMF modules.

/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default

Is it possible to query this variable and set ESMF_INC? That would probably fix the issue and would be more generic solution without having special ESMF installation by setting environment variables.

uturuncoglu commented 4 years ago

It is also possible to pass ESMF_F90COMPILEPATHS variable as ESMF_INC but i am not sure.

climbfuji commented 4 years ago

Let me try - I will use /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libg/Linux.intel.64.mpt.default/esmf.mk because your lib0 directory is empty.

uturuncoglu commented 4 years ago

Thanks. Let me know, if you need help. BTW, I could see the files in

/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/

esmf.mk  libesmf.a  libesmf_fullylinked.so  libesmf.so  libesmftrace_preload.so  libesmftrace_static.a  preload.sh
uturuncoglu commented 4 years ago

I am able to install NCEPLIBS on Stampede2. I did not test it with the model yet but i'll do it today and let you know.

climbfuji commented 4 years ago

@uturuncoglu The fix in https://github.com/NOAA-EMC/NCEPLIBS/pull/19 should work for you on Cheyenne/Stampede/.... Please test. Thanks!

arunchawla-NOAA commented 4 years ago

@uturuncoglu and @climbfuji once confirmed that this is working can we close this ticket?

uturuncoglu commented 4 years ago

It is tested and works on Stampede. I did not install it to Cheyenne yet.

uturuncoglu commented 4 years ago

@climbfuji I have a problem with GNU on Cheyenne with the latest version of the library (hash for superbuild is 2458bc2),

-- Build files have been written to: /glade/work/turuncu/UFS/NCEP_LIBS_ALL.jan22/build_all_gnu/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build
[ 43%] Performing build step for 'NCEPLIBS-nemsio'
Scanning dependencies of target nemsio_v2.2.3
[ 16%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_openclose.f90.o
[ 33%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_read.f90.o
[ 50%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_write.f90.o
[ 66%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module.f90.o
[ 83%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o
f951: Fatal Error: Reading module ‘mpi’ at line 1 column 2: Unexpected EOF
compilation terminated.
CMakeFiles/nemsio_v2.2.3.dir/build.make:75: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o' failed
make[5]: *** [CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/all' failed
make[4]: *** [CMakeFiles/nemsio_v2.2.3.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/NCEPLIBS-nemsio.dir/build.make:111: recipe for target 'NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build' failed
make[2]: *** [NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build] Error 2
CMakeFiles/Makefile2:667: recipe for target 'CMakeFiles/NCEPLIBS-nemsio.dir/all' failed
make[1]: *** [CMakeFiles/NCEPLIBS-nemsio.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

This is my environment,

Currently Loaded Modules:
  1) ncarenv/1.3   2) cmake/3.14.4   3) esmf-8.0.0-ncdfio-mpt-O   4) gnu/9.1.0   5) openblas/0.3.6   6) mpt/2.19   7) netcdf/4.7.3   8) ncarcompilers/0.5.0
uturuncoglu commented 4 years ago

@climbfuji I think i solved it. I just need to use following,

CC=mpicc FC=mpif90 CXX=mpicxx cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..
climbfuji commented 4 years ago

Yes, I think this was the solution/conclusion we came up with when the initial build failed on Cheyenne. Should be part of the documentation (do we have a place for machine-specific instructions)?

uturuncoglu commented 4 years ago

You mean, in the application or NCEPLIBS documentation?

climbfuji commented 4 years ago

I think NCEPLIBS - there should be general instructions followed by some machine-specific instructions. We will work on that.

kgerheiser commented 4 years ago

What is the problem? Compiling using the regular compiler and not mpicc/mpif90? nemsio includes MPI_Frotran_INCLUDE_DIRS so it seems strange that it doesn't find the module.

uturuncoglu commented 4 years ago

Yes, that would be great. Thanks for your help!

climbfuji commented 4 years ago

On Cheyenne, CISL use their own in-house MPI wrappers around MPT and other MPI implementations. I am not exactly sure why they are doing this (maybe because their MPT installation requires them to fix issues similar to what Ufuk was reporting), but definitely there is something weird with the system.

uturuncoglu commented 4 years ago

@climbfuji BTW, are you testing NCEPLIBS on Mac, if yes could you share your way to install libs. We would like to also test it in our side.

climbfuji commented 4 years ago

@climbfuji BTW, are you testing NCEPLIBS on Mac, if yes could you share your way to install libs. We would like to also test it in our side.

Yes, I am testing them. I need to ask you to hold off for a few days - I am still ironing out differences and best ways, looking at issues with the post-processor and the different macOS versions (Mojave versus Catalina). If you are looking for work, we also need to have folks working on various Linux distributions: Redhat/CentOS (7 and 8), Ubuntu (which versions?), possibly others.

uturuncoglu commented 4 years ago

We just need to understand the configuration on Mac and Linux to define those platforms in CIME as much as possible. If you have documentation about those installation, that would be great for us.

rsdunlapiv commented 4 years ago

Closing and opening a new ticket for the Mac/Linux configuration.