ufs-community / ufs-weather-model

UFS Weather Model
Other
129 stars 238 forks source link

ESMF requirement for external land component #1542

Closed uturuncoglu closed 4 months ago

uturuncoglu commented 1 year ago

Description

We are in a transition on moving from FMS to ESMF to handle multi-tile file access (read/write) under new external land component (NOAHMP) and ESMF tag v8.5.0b10 has all the development in terms of multi-tile file I/O through the PIO.

Solution

Install v8.5.0b10 on supported platforms and update UFS to use this version.

Alternatives

N/A

Related to

Directly reference any issues or PRs in this or other repositories that this is related to, and describe how they are related. N/A

uturuncoglu commented 1 year ago

@junwang-noaa I created this issue to track installation of new ESMF tag which is required for the external land component and it is required for the next PR related with it. I am not expecting next external land component PR soon but it would be nice to start thinking about it since installation of new ESMF tag could take time. I think that will be handled by the EPIC team but I am not sure. Let me know what do you think?

junwang-noaa commented 1 year ago

@uturuncoglu Thanks for creating the issue. We are currently getting the ESMF 840 release version installed and used in ufs WM as we have the operational code freeze coming soon and only release version is accepted in operation. We can ask EPIC team to install some test version ESMF v8.5.0b10 on R&D platform for this external land component work.

uturuncoglu commented 1 year ago

@junwang-noaa Thanks. I think once operational code freezing is passed. The UFS model could start using beta snapshots again. Right?

junwang-noaa commented 1 year ago

I think so.

uturuncoglu commented 1 year ago

@junwang-noaa is there any update about it? What about the operational code freeze. Is it done? Once this will available I am plaining to replace the I/O layer in the land component.

junwang-noaa commented 1 year ago

No, not yet. We are waiting for HR1 testing before we create a tag. @jkbk2004 can your team install ESMF v8.5.0b10 library on hera? Thanks

uturuncoglu commented 1 year ago

@junwang-noaa Thanks for the update. I think NCAR's Cheyenne will be better since I have no access to Hera.

jkbk2004 commented 1 year ago

@uturuncoglu We can coordinate thru EPIC on cheyenne.

jkbk2004 commented 1 year ago

I will give a try to install 8.5.0b10 on Cheyenne over weekend.

uturuncoglu commented 1 year ago

@jkbk2004 is there any progress on this? thanks.

jkbk2004 commented 1 year ago

@jkbk2004 is there any progress on this? thanks.

@uturuncoglu give a try 8.5.0b10 installed at /glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/modulefiles/stack. Last week was busy one for program increment planning. Let me know

uturuncoglu commented 1 year ago

@jkbk2004 Thanks for your help. I tried to compile the model with new version of ESMF and I am getting following error from the link step,

/usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: /glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/lib/libpioc.a(pioc.c.o): in function `PIOc_iosystem_is_active':
/glade/work/jongkim/stacks/hash/hpc-stack-6eb6/pkg/v8.5.0b10/src/Infrastructure/IO/PIO/ParallelIO/src/clib/pioc.c:97: multiple definition of `PIOc_iosystem_is_active'; /glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/pio/2.5.7/lib/libpioc.a(pioc.c.o):/glade/work/jongkim/stacks/hash/hpc-stack-6eb6/pkg/pio-2.5.7/src/clib/pioc.c:97: first defined here

I think that ESMF 8.5.0b10 is using its own internal PIO library and this is conflicting with the external installation due to the version differences maybe. Is it possible to install ESMF by pointing external PIO. So, it would not cause a conflict. It seems that ESMF is using 2.5.10. You could set following variables for it,

export ESMF_PIO="external"
export ESMF_PIO_LIBPATH=$PIO_LIBDIR
export ESMF_PIO_INCLUDE=$PIO_INCDIR

I wonder if UFS tested with the ESMF Version > 8.3.0b09 before.

uturuncoglu commented 1 year ago

@jkbk2004 Hi, I just want to check the current status of this installation. Thanks.

jkbk2004 commented 1 year ago

@jkbk2004 Hi, I just want to check the current status of this installation. Thanks.

@uturuncoglu I will take a look. I will get back to you tomorrow.

uturuncoglu commented 1 year ago

@jkbk2004 Thank you. It is not super urgent but it would be nice to have it soon since I am planing to put restructured I/O code that leverages from ESMF multi-tile support to Noah-MP.

jkbk2004 commented 1 year ago

@jkbk2004 Thank you. It is not super urgent but it would be nice to have it soon since I am planing to put restructured I/O code that leverages from ESMF multi-tile support to Noah-MP.

I tried ... but it sounds like an issue make chkdir_apps make[5]: Entering directory '/glade/work/jongkim/stacks/hash/hpc-stack-6eb6/pkg/v8.5.0b10/src/apps/ESMF_PrintInfo' make[5]: Leaving directory '/glade/work/jongkim/stacks/hash/hpc-stack-6eb6/pkg/v8.5.0b10/src/apps/ESMF_PrintInfo' mpif90 -m64 -mcmodel=small -pthread -threads -cxxlib -Wl,--no-as-needed -qopenmp -L/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/hdf5/1.10.6/lib -L/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/zlib/1.2.11/lib -L/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/lib -L/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/netcdf/4.7.4/lib -L/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/pio/2.5.7/lib -Wl,-rpath,/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/lib -Wl,-rpath,/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/netcdf/4.7.4/lib -Wl,-rpath,/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/pio/2.5.7/lib -o /glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/bin/ESMF_PrintInfo /glade/work/jongkim/stacks/hash/hpc-stack-6eb6/pkg/v8.5.0b10/obj/objO/Linux.intel.64.mpt.default/src/apps/ESMF_PrintInfo/ESMF_PrintInfo.o -lesmf -lmpi++ -lrt -ldl -lnetcdff -lnetcdf -lhdf5_hl -lhdf5 -lz -ldl -lm -lpioc /usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: /glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/lib/libesmf.so: undefined reference toPIOc_InitDecomp_ReadOnly' /glade/work/jongkim/stacks/hash/hpc-stack-6eb6/pkg/v8.5.0b10/build/common.mk:2583: recipe for target '/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/bin/ESMF_PrintInfo' failed make[4]: *** [/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/intel-2022.1/mpt-2.25/esmf/8.5.0b10/bin/ESMF_PrintInfo] Error 1`

jkbk2004 commented 1 year ago

I was using pio/2.5.7 installed already at /glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/modulefiles/stack

jkbk2004 commented 1 year ago

Yeah, we need pio-2.5.8 that has PIOc_InitDecomp_ReadOnly

jkbk2004 commented 1 year ago

let me try again with pio-2.5.8

jkbk2004 commented 1 year ago

@uturuncoglu It did go thru with pio-2.5.8. Give a try one more time with module path https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/ufs_cheyenne.intel.lua

uturuncoglu commented 1 year ago

@jkbk2004 Thanks for your help. I could able to compile the model with pio 2.5.8 and esmf 8.5.0b10. I'll try to update my fork with new I/O later that uses ESMF multi-tile support to see what happens. I'll update you soon.

uturuncoglu commented 1 year ago

@jkbk2004 I confirm that it is working without any issue. BTW, do we have also GNU version on Cheyenne. It would be nice to test new I/O implementation under GNU too to see any possible issues.

DeniseWorthen commented 1 year ago

@uturuncoglu In CMEPS, there is a routine in med.F90 called med_grid_write, which is limited right now to tileCount=1. Will the new I/O features allow tileCount>1 in this routine?

jkbk2004 commented 1 year ago

@jkbk2004 I confirm that it is working without any issue. BTW, do we have also GNU version on Cheyenne. It would be nice to test new I/O implementation under GNU too to see any possible issues. @uturuncoglu sure! I will install them on gnu as well. I will keep you posted: maybe sometime this afternoon.

uturuncoglu commented 1 year ago

@DeniseWorthen I think we could try to remove that restriction with the recent update in ESMF side. I am currently working on restructuring I/O later in Noah-MP component model. Once I have finalized that one, I could try to test it on CMEPS.

jkbk2004 commented 1 year ago

@uturuncoglu we migrated cheyenne hpc stack locations yesterday. Old ones still available. I want to follow up again with new locations. @natalie-perlin can you install esmf-8.5.0b10 on cheyenne? it needs pio-2.5.8 (read conversation above). Please, give a priority. Installation itself goes thru quickly.

uturuncoglu commented 1 year ago

@jkbk2004 @natalie-perlin You mean the module locations are changed? BTW, last tag is v8.5.0b14 and also has couple of fix related with I/O but I think it requires pio-2.5.10. Anyway, we could also stick to the esmf-8.5.0b10 and pio-2.5.8 for both Intel and GNU.

jkbk2004 commented 1 year ago

@jkbk2004 @natalie-perlin You mean the module locations are changed? BTW, last tag is v8.5.0b14 and also has couple of fix related with I/O but I think it requires pio-2.5.10. Anyway, we could also stick to the esmf-8.5.0b10 and pio-2.5.8 for both Intel and GNU.

@uturuncoglu Yeah, we made location changes at weather model develop branch yesterday. But you can stay with old one. Let me install esmf-8.5.0b10 and pio-2.5.8 gnu to old location now. I will let you know in an hour or so.

jkbk2004 commented 1 year ago

@uturuncoglu give a try gnu at /glade/work/epicufsrt/GMTB/tools/gnu/10.1.0/hpc-stack-v1.2.0/modulefiles/stack. I installed esmf-8.5.0b10 there.

uturuncoglu commented 1 year ago

@jkbk2004 okay. thanks for your help.

natalie-perlin commented 1 year ago

@jkbk2004 @uturuncoglu - Currently installing pio-2.5.8 and esmf-8.5.0b10 in standard (updated yesterday) locations on cheyenne, for intel/2022.1 and gnu/10.1.

natalie-perlin commented 1 year ago

@jkbk2004 @uturuncoglu - done for Cheyenne, installed pio-2.5.8 and esmf-8.5.0b10 /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0/ and /glade/work/epicufsrt/contrib/hpc-stack/intel2022.1/

uturuncoglu commented 1 year ago

@natalie-perlin Thank you very much. I'll try gnu later today.

uturuncoglu commented 1 year ago

@natalie-perlin @jkbk2004 It turns out that there is a memory corruption bug in esmf-8.5.0b10. So, it would be nice to have esmf-8.5.0b17 with pio-2.5.10 on Cheyyene. I just wonder if it is possible to install it? Thanks.

BTW, i am also getting error from previous GNU installation like following,

Lmod is automatically replacing "intel/19.1.1" with "gnu/10.1.0".

Lmod has detected the following error: The following module(s) are unknown:
"hpc-mpt/2.22"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "hpc-mpt/2.22"

Also make sure that all modulefiles written in TCL start with the string
#%Module

Executing this command requires loading "hpc-mpt/2.22" which failed while
processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_cheyenne.gnu  /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_cheyenne.gnu.luaExecuting this command requires loading "pio/2.5.8" which failed while
processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_common        /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_common.lua
    ufs_cheyenne.gnu  /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_cheyenne.gnu.lua
natalie-perlin commented 1 year ago

@uturuncoglu - sure, will take care of esmf-8.5.0b17 with pio-2.5.10.

I have some idea why there could be gnu/10.1.0 complaints. When do you get these error reports?

natalie-perlin commented 1 year ago

The default for cheyenne is the intel/19.x.x compiler. After the Lmod initialization, all the default modules are loaded, and then replaced by those needed for a particular stack.

uturuncoglu commented 1 year ago

The error related to GNU is coming when I try to build UFS.

uturuncoglu commented 1 year ago

@natalie-perlin BTW, thanks for your help.

natalie-perlin commented 1 year ago

@uturuncoglu - I was able to compile the UFS-WM on Cheyenne with gnu/10.1.0 with no issues. That's what my steps were:

git clone https://github.com/ufs-community/ufs-weather-model.git ufs-wm-dev-gnu10
.1
cd ufs-wm-dev-gnu10.1
git submodule update --init --recursive
module use modulefiles
module load ufs_cheyenne.gnu
export CMAKE_FLAGS="-DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16"
export BUILD_VERBOSE=1
./build.sh 2>&1 | tee build.log1

You could view a log file on Cheyenne in /glade/scratch/nperlin/UFS-WM/ufs-wm-dev-gnu10.1/build.log1

This hpc-stack /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0/modulefiles/stack is build with mpt/2.25. The error that shows up in your snippet has a reference to mpt/2.22. Was it maybe some earlier builds? When did you receive this error?

natalie-perlin commented 1 year ago

@uturuncoglu - could it be due to the modulefiles/ufs_cheyenne.gnu.lua not updated? In that modulefile, one line that needs update is for the location of the stack, and another line needs update of the hpc-mpt/2.25 module loaded instead of the hpc-mpt/2.22.

uturuncoglu commented 1 year ago

@natalie-perlin Could be the reason. My UFS version is not the most recent but I could try to sync again and test.

uturuncoglu commented 1 year ago

@natalie-perlin I moved to mpt/2.25 for GNU and now I am getting following error,

Lmod is automatically replacing "intel/19.1.1" with "gnu/10.1.0".

Lmod has detected the following error: Cannot load module
"mapl/2.22.0-esmf-8.3.0b09". At least one of these module(s) must be loaded:
   esmf/8.3.0b09 esmf/8.3.0b09-debug

While processing the following module(s):
    Module fullname            Module Filename
    ---------------            ---------------
    mapl/2.22.0-esmf-8.3.0b09  /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0/modulefiles/mpi/gnu/10.1.0/mpt/2.25/mapl/2.22.0-esmf-8.3.0b09.lua
    ufs_common                 /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_common.lua
    ufs_cheyenne.gnu           /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_cheyenne.gnu.lua

I think mail is also depend on used ESMF version.

uturuncoglu commented 1 year ago

@natalie-perlin I am not sure how Intel is working with

mapl_ver=os.getenv("mapl_ver") or "2.22.0-esmf-8.3.0b09"
load(pathJoin("mapl", mapl_ver))

entry in the modulefiles/ufs_common.lua.

natalie-perlin commented 1 year ago

@uturuncoglu - looking into installation of esmf-8.5.0b17 and pio-2.5.10. Is there a requirement of the higher hdf5 and netcdf version?

natalie-perlin commented 1 year ago

@uturuncoglu pio/2.5.10 + esmf/8.5.0b17 + mapl/2.22-esmf-8.5.0b17 are ready on Cheyenne for gnu/10.1.0 stack, in /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0

We have hdf5/1.14.0 and netcdf-c/4.9.1+netcdf-fortran/4.6.0 installed successfully in other locations/ Hera. Let me know if you need esmf and mapl built with these higher hdf5+netcdf versions. (Just to keep in mind that I would then need to clear the current installations of esmf/8.5.0b17 + mapl/2.22-esmf-8.5.0b17, which are build with hdf5/1.10.6 and netcdf/4.7.4.)

uturuncoglu commented 1 year ago

@natalie-perlin sorry for late response. I was sick whole the week and I am starting to work slowly again. I'll test the GNU esmf-8.5.0b17. I don't have newer version of those libraries. If I could also have INTEL version that would be great and sufficient for me. Thanks again for kind help.

natalie-perlin commented 1 year ago

@uturuncoglu - Installed for hpc-stack with intel/2022.1, on Cheyenne: pio/2.5.10 + esmf/8.5.0b17 + mapl/2.22-esmf-8.5.0b17

in/glade/work/epicufsrt/contrib/hpc-stack/intel2022.1/

Load with module use /glade/work/epicufsrt/contrib/hpc-stack/intel2022.1/modulefiles/stack module load hpc/1.2.0

uturuncoglu commented 1 year ago

@natalie-perlin Thanks for your help. I am getting following error from mapl module,

Lmod has detected the following error:  The following module(s) are unknown: "mapl/2.22-esmf-8.5.0b17"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "mapl/2.22-esmf-8.5.0b17"

Also make sure that all modulefiles written in TCL start with the string #%Module

Executing this command requires loading "mapl/2.22-esmf-8.5.0b17" which failed while processing the following module(s):

    Module fullname     Module Filename
    ---------------     ---------------
    ufs_common          /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_common.lua
    ufs_cheyenne.intel  /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_cheyenne.intel.lua
uturuncoglu commented 1 year ago

@natalie-perlin same also for GNU

Lmod has detected the following error:  The following module(s) are unknown: "hpc-mpt/2.22" "mapl/2.22-esmf-8.5.0b17"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "hpc-mpt/2.22" "mapl/2.22-esmf-8.5.0b17"

Also make sure that all modulefiles written in TCL start with the string #%Module

Executing this command requires loading "hpc-mpt/2.22" which failed while processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_cheyenne.gnu  /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_cheyenne.gnu.luaExecuting this command requires loading "mapl/2.22-esmf-8.5.0b17" which failed while processing the following module(s):

    Module fullname   Module Filename
    ---------------   ---------------
    ufs_common        /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_common.lua
    ufs_cheyenne.gnu  /glade/work/turuncu/NOAHMP/ufs-weather-model_dev/modulefiles/ufs_cheyenne.gnu.lua

Any idea? Thanks.

jkbk2004 commented 1 year ago

@natalie-perlin we are using hpc_mpt_ver=os.getenv("hpc_mpt_ver") or "2.25" Why does the error complain about mpt 22 ?