ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

CIME model build error on Ubuntu 18 #212

Closed climbfuji closed 4 years ago

climbfuji commented 4 years ago

On my Amazon Ubuntu 18 instance, I can build the release/public-v1 branch of the fus-weather-model successfully using build.sh - thus everything is correct. However, the model build fails when done through CIME with:

/usr/bin/ld: /usr/local/ufs-release-v1.1.0/lib/libesmf.a(ESMCI_VMKernel.o): undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'

CIME seems to add something to the build that shouldn't be there.

uturuncoglu commented 4 years ago

@climbfuji I could be related with the installation of NCEPLIBS but I am not sure. What do you think @jedwards4b?

climbfuji commented 4 years ago

@climbfuji I could be related with the installation of NCEPLIBS but I am not sure. What do you think @jedwards4b?

Hmm ... but why does it work with the ufs-weather-model standalone build using build.sh?

uturuncoglu commented 4 years ago

I am not sure but we have no control in NCEPLIBS. Maybe there is some external library is missing in CIME side but exist in model. I have no Ubuntu to test these things but you might want to compare the build log and actually compile commands used both system.

climbfuji commented 4 years ago

I am not sure but we have no control in NCEPLIBS. Maybe there is some external library is missing in CIME side but exist in model. I have no Ubuntu to test these things but you might want to compare the build log and actually compile commands used both system.

Ok, will keep looking. Thanks.

climbfuji commented 4 years ago

Update. If I take the linker command that CIME creates and manually append -lpthread, I can compile the executable. I tried to add it to cime/config/ufs/machines/config_compilers.xml in the form of

<compiler MACH="linux" COMPILER="gnu">
  <LDFLAGS>
    <!-- Dom BLABLA -->
    <append> -lpthread </append>
  </LDFLAGS>
</compiler>

but that does not work, because it doesn't append -lpthread, but prepends it:

/usr/local/ufs-release-v1.1.0/bin/mpif90    -lpthread   -fconvert=big-endian -ffree-line-length-none  -fcray-pointer -fno-range-check -fbacktrace  -O -fdefault-real-8 -fdefault-double-8  -O2 -fPIC CMakeFiles/NEMS.exe.dir/NEMS/src/MAIN_NEMS.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_NEMS_UTILS.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_MEDIATOR_methods.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_MEDIATOR.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_MEDIATOR_SpaceWeather.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_EARTH_INTERNAL_STATE.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_EARTH_GRID_COMP.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_NEMS_INTERNAL_STATE.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_NEMS_GRID_COMP.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/module_NEMS_Rusage.F90.o CMakeFiles/NEMS.exe.dir/NEMS/src/nems_c_rusage.c.o CMakeFiles/NEMS.exe.dir/NEMS/src/ENS_Cpl/ENS_CplComp_ESMFMod_STUB.F90.o  -o NEMS.exe  FV3/libfv3cap.a FV3/libfv3core.a FV3/io/libio.a FV3/gfsphysics/libgfsphysics.a FV3/ccpp/driver/libccppdriver.a FV3/ccpp/physics/libccppphys.a FV3/ccpp/framework/src/libccpp.a FV3/cpl/libfv3cpl.a FV3/stochastic_physics/libstochastic_physics.a FMS/libfms.a /usr/local/ufs-release-v1.1.0/lib/libnemsio_v2.3.0.a /usr/local/ufs-release-v1.1.0/lib/libbacio_v2.2.0_4.a /usr/local/ufs-release-v1.1.0/lib/libsp_v2.1.0_d.a /usr/local/ufs-release-v1.1.0/lib/libw3emc_v2.5.0_d.a /usr/local/ufs-release-v1.1.0/lib/libw3nco_v2.1.0_d.a -Wl,-rpath,/usr/local/ufs-release-v1.1.0/lib -L/usr/local/ufs-release-v1.1.0/lib -lesmf  -lrt -lstdc++ -ldl /usr/local/ufs-release-v1.1.0/lib/libnetcdff.a /usr/local/ufs-release-v1.1.0/lib/libnetcdf.a /usr/local/ufs-release-v1.1.0/lib/libhdf5_hl.a /usr/local/ufs-release-v1.1.0/lib/libhdf5.a /usr/local/ufs-release-v1.1.0/lib/libz.a -L/usr/local/ufs-release-v1.1.0/lib -lnetcdff -lnetcdf FV3/libfv3core.a FMS/libfms.a FV3/ipd/libipd.a FV3/gfsphysics/libgfsphysics.a -ldl
jedwards4b commented 4 years ago

In the fv3 interface file configure_cime.cmake I see: set (CMAKE_EXE_LINKER_FLAGS "${LDFLAGS} ${SLIBS}")
try reversing the order: set (CMAKE_EXE_LINKER_FLAGS "${SLIBS} ${LDFLAGS}")

climbfuji commented 4 years ago

Thanks @jedwards4b, will try.

climbfuji commented 4 years ago

I made that change but -lpthread still shows up before all the other options. Here is what I changed:

diff --git a/cime_config/configure_cime.cmake b/cime_config/configure_cime.cmake
index 8eec1bb..fff33ee 100644
--- a/cime_config/configure_cime.cmake
+++ b/cime_config/configure_cime.cmake
@@ -47,9 +47,11 @@ endif()
 message("4: CMAKE_Fortran_FLAGS ${CMAKE_Fortran_FLAGS}")

 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D__IFC ${CFLAGS}")
-set (CMAKE_EXE_LINKER_FLAGS "${LDFLAGS} ${SLIBS}")
+#set (CMAKE_EXE_LINKER_FLAGS "${LDFLAGS} ${SLIBS}")
+set (CMAKE_EXE_LINKER_FLAGS "${SLIBS} ${LDFLAGS}")
 # print build options
 message("5: CMAKE_C_FLAGS ${CMAKE_C_FLAGS}")
+message("5: CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS}")

 if(DEBUG)
     message("DEBUG  is      ENABLED")

and the log messages are:

1: CMAKE_Fortran_FLAGS
2: CMAKE_Fortran_FLAGS  -fconvert=big-endian -ffree-line-length-none  -fcray-pointer -fno-range-check -fbacktrace  -O
3: CMAKE_Fortran_FLAGS  -fconvert=big-endian -ffree-line-length-none  -fcray-pointer -fno-range-check -fbacktrace  -O -fdefault-real-8 -fdefault-double-8
4: CMAKE_Fortran_FLAGS  -fconvert=big-endian -ffree-line-length-none  -fcray-pointer -fno-range-check -fbacktrace  -O -fdefault-real-8 -fdefault-double-8
5: CMAKE_C_FLAGS  -D__IFC -std=gnu99  -O
5: CMAKE_EXE_LINKER_FLAGS    -lpthread

I think I have some data points that allow me to keep investigating.

climbfuji commented 4 years ago

I got it to work with a nasty hack, and besides being nasty I am not sure it works on other platforms.

diff --git a/cime_config/configure_cime.cmake b/cime_config/configure_cime.cmake
index 8eec1bb..0f0b8fd 100644
--- a/cime_config/configure_cime.cmake
+++ b/cime_config/configure_cime.cmake
@@ -50,6 +50,7 @@ set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D__IFC ${CFLAGS}")
 set (CMAKE_EXE_LINKER_FLAGS "${LDFLAGS} ${SLIBS}")
 # print build options
 message("5: CMAKE_C_FLAGS ${CMAKE_C_FLAGS}")
+message("5: CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS}")

 if(DEBUG)
     message("DEBUG  is      ENABLED")
@@ -91,6 +92,6 @@ set(ESMF_LIBS "${ESMF_F90ESMFLINKRPATHS} ${ESMF_F90ESMFLINKPATHS} ${ESMF_F90ESMF

 set(NETCDF_INC_DIR $ENV{NETCDF}/include)
 set(NETCDF_LIBDIR $ENV{NETCDF}/lib)
-set(NETCDF_LIBS -L$ENV{NETCDF}/lib -lnetcdff -lnetcdf)
+set(NETCDF_LIBS -L$ENV{NETCDF}/lib -lnetcdff -lnetcdf -lpthread)

 message("")

I'll look for a better way to do this only for linux.

jedwards4b commented 4 years ago

@climbfuji you said that the non cime build is doing the right thing? Why is that? I don't see pthread mentioned in https://github.com/ufs-community/ufs-weather-model/blob/develop/conf/configure.fv3.linux.gnu at all.

climbfuji commented 4 years ago

You would have to look in https://github.com/ufs-community/ufs-weather-model/blob/release/public-v1/cmake/configure_linux.gnu.cmake (note release/public-v1 instead of develop, and note the cmake config instead of the gnumake config). But you are right, even in those files there is no link to it.

I think I know the correct solution / fix, but I don't know yet why the problem doesn't show up when I use build.sh without the fix. I figured that it's the linking of the static ESMF library that requires the pthread library to be linked when compiling through CIME.

I then checked esmf.mk on the linux machine and found that the -pthread argument is in ESMF_F90LINKOPTS, but not in one of the three variables that the release/public-v1 version of cmake/FindESMF.cmake uses:

                                      ESMF_F90COMPILEPATHS
                                      ESMF_F90ESMFLINKRPATHS
                                      ESMF_F90ESMFLINKLIBS

The new cmake/FindNetCDF.cmake in develop however includes ESMF_F90LINKOPTS. With the following change in src/model/FV3/cime/cime_config/configure_cime.cmake I can get this to work:

diff --git a/cime_config/configure_cime.cmake b/cime_config/configure_cime.cmake
index 8eec1bb..b5d0b69 100644
--- a/cime_config/configure_cime.cmake
+++ b/cime_config/configure_cime.cmake
@@ -87,7 +87,7 @@ set(POST_INC $ENV{POST_INC})
 set(NCEP_LIBS $ENV{POST_LIB} $ENV{NEMSIO_LIB} $ENV{G2_LIB4} $ENV{G2TMPL_LIB} $ENV{BACIO_LIB4} $ENV{SP_LIBd} $ENV{W3EMC_LIBd} $ENV{W3NCO_LIBd} $ENV{CRTM_LIB} $ENV{PNG_LIB} $ENV{JASPER_LIB} $ENV{Z_LIB})

 set(ESMF_MOD ${ESMF_F90COMPILEPATHS})
-set(ESMF_LIBS "${ESMF_F90ESMFLINKRPATHS} ${ESMF_F90ESMFLINKPATHS} ${ESMF_F90ESMFLINKLIBS}")
+set(ESMF_LIBS "${ESMF_F90ESMFLINKRPATHS} ${ESMF_F90ESMFLINKPATHS} ${ESMF_F90ESMFLINKLIBS} ${ESMF_F90LINKOPTS}")

 set(NETCDF_INC_DIR $ENV{NETCDF}/include)
 set(NETCDF_LIBDIR $ENV{NETCDF}/lib)

Making this change will require us to go back and test it on every machine again (at least compiling once through CIME and running the app end-to-end for a quick C96 test case, not running the regression tests or a big C768 test case).

If we want to be 100% correct, we would also want to make the following change in ufs-weather-model release/public-v1, which would require another round of regression tests, retagging, ...

diff --git a/cmake/FindESMF.cmake b/cmake/FindESMF.cmake
index 175a394..05dd2dc 100644
--- a/cmake/FindESMF.cmake
+++ b/cmake/FindESMF.cmake
@@ -25,12 +25,14 @@ string(REPLACE " " ";" ESMF_F90COMPILEPATHS ${ESMF_F90COMPILEPATHS})
 if(ESMF_VERSION_MAJOR AND
    ESMF_F90COMPILEPATHS AND
    ESMF_F90ESMFLINKRPATHS AND
-   ESMF_F90ESMFLINKLIBS)
+   ESMF_F90ESMFLINKLIBS AND
+   ESMF_F90LINKOPTS)
   message(" Found ESMF:")
   message("ESMF_VERSION_MAJOR:     ${ESMF_VERSION_MAJOR}")
   message("ESMF_F90COMPILEPATHS:   ${ESMF_F90COMPILEPATHS}")
   message("ESMF_F90ESMFLINKRPATHS: ${ESMF_F90ESMFLINKRPATHS}")
   message("ESMF_F90ESMFLINKLIBS:   ${ESMF_F90ESMFLINKLIBS}")
+  message("ESMF_F90LINKOPTS:       ${ESMF_F90LINKOPTS}")
 else()
   message("One of the ESMF_ variables is not defined")
 endif()
@@ -43,5 +45,6 @@ find_package_handle_standard_args(ESMF
                                       ESMF_F90COMPILEPATHS
                                       ESMF_F90ESMFLINKRPATHS
                                       ESMF_F90ESMFLINKLIBS
+                                      ESMF_F90LINKOPTS
                                     VERSION_VAR
                                       ESMF_VERSION_STRING)

I am not sure it is necessary. I'd say if the change in the fv3gfs cime interface works on all platforms, then we should be ok.

@uturuncoglu @ligiabernardet FYI

climbfuji commented 4 years ago

I created a (draft) PR for this, https://github.com/ESCOMP/FV3GFS_interface/pull/15. Will test it on a few platforms to make sure it doesn't break anything (it shouldn't!).

uturuncoglu commented 4 years ago

@climbfuji is this also solved by the recent PR in the interface?

climbfuji commented 4 years ago

Closed via https://github.com/ESCOMP/FV3GFS_interface/pull/16