ufs-community / ufs-weather-model

UFS Weather Model
Other
134 stars 242 forks source link

UFS-WM testing w/ spack-stack #1651

Closed ulmononian closed 6 months ago

ulmononian commented 1 year ago

Description

As the transition from hpc-stack to spack-stack is ongoing (e.g., https://github.com/ufs-community/ufs-weather-model/issues/1448, #1621, Acorn spack testing, spack-stack #454, spack-stack #478) a new spack-based Unified Environment (UE) has been developed to help facilitate the switch. This environment contains a "unified" set of compiler+MPI (Intel & GNU), libraries/packages, and modules to support the UFS-WM and various related apps (e.g., global-workflow, SRW, JEDI Skylab, and GSI).

The preliminary (beta) installation has been installed by @climbfuji here on Orion: /work2/noaa/da/role-da/spack-stack-feature-r2d2-mysql/envs/unified-4.0.0-rc1/installand can be loaded via:

module use /work2/noaa/da/role-da/spack-stack-feature-r2d2-mysql/envs/unified-4.0.0-rc1/install/modulefiles/Core module av

An initial testing round of the UFS-WM (as well as the global-workflow, SRW, SkyLab, and GSI) using the UE has been completed on Orion (@mark-a-potts successfully completed the full rt.sh suite w/ a new baseline). For a recent sample compile/run of cpld_control_p8, see: /work/noaa/stmp/cbook/stmp/cbook/FV3_RT/rt_198650). Some additional UFS-WM RTs have been performed with the UE on Parallel Works - AWS (e.g., cpld_control_c48); however, this testing is ongoing in collaboration with @yichengt90 / @clouden90.

Solution

Upon release of spack-stack@1.3.0, the Unified Environment will be installed in official NOAA-EPIC & JCSDA locations on these spack-stack pre-configured sites. Given that, testing of the UFS-WM with the spack-stack UE will need to be expanded significantly. Ideally, the full set of RTs should be run on each machine; new baselines will more than likely be required.

Module files will need to be updated concomitantly with this testing (e.g.: https://github.com/ulmononian/ufs-weather-model/blob/test_spack/modulefiles/ufs_orion.intel.lua). For running on the cloud, various modifications also need to be made to the RT scripts and configuration files (i.e.: #1650; see https://github.com/ulmononian/ufs-weather-model/tree/feature/noaacloud_rt).

Further, ESMF library naming and linking needs to be addressed (see #1498), but is currently handled in spack via NOAA-EMC/spack #238. Note that recently merged PR #1645 addressed the removal of the static parallelio requirement, which is pertinent to implementing spack-stack as the UE uses shared parallelio (with an exception for operational machines).

This issue can be used to track some of the testing (successes & failures!) and hopefully facilitate some discussion about the transition.

Related to

may help address #1147, #1448 pertains to #1621

Butterfly test results look good: cpld_control_p8. Comparison of 500mb temperature impact between this PR and develop branch is here: butterfly

Originally posted by @jkbk2004 in https://github.com/ufs-community/ufs-weather-model/issues/1707#issuecomment-1598935174

jkbk2004 commented 1 year ago

@ulmononian Do you have any update regarding spack stack docker container? Library update for hdf-1.14.0/netcdf-4.9.1/esmf-8.4.1/mapl-2.35.2 is on-going priority. Following the update, EPIC needs to maintain the container used for Jenkins-CI pipeline in real time. Please, let me know if we need a quick tag-up for this.

ulmononian commented 1 year ago

@ulmononian Do you have any update regarding spack stack docker container? Library update for hdf-1.14.0/netcdf-4.9.1/esmf-8.4.1/mapl-2.35.2 is on-going priority. Following the update, EPIC needs to maintain the container used for Jenkins-CI pipeline in real time. Please, let me know if we need a quick tag-up for this.

the ufs-wm container based on the spack-stack unified environment package versions/variants will be delivered shortly after the release of spack-stack@1.3.0. i anticipate that they will be available by late next week or early the following week; once ready, i will let you know. one caveat to the ufs-wm spack-stack container is that we will not provide debug versions of MAPL or ESMF within the containers, as the current paradigm allows only one package version within the container.

for the interim, if you are interested, please have a look at the JEDI Skylab containers, which utilize spack-stack. they are available with (i) clang/mpich, and (ii) gnu/openmpi. note that Skylab has a different version of crtm than used by the ufs-wm, so these cannot be used for building/running the wm.

jkbk2004 commented 1 year ago

What about spack stack itself ? Is it going to have debug versions of MAPL or ESMF ?

climbfuji commented 1 year ago

Yes it’s got both as part of the unified environment on the HPCs. Consistent builds of ESMF debug with MAPL debug, and ESMF release with MAPL release

On Mar 10, 2023, at 8:24 AM, JONG KIM @.***> wrote:

What about spack stack itself ? Is it going to have debug versions of MAPL or ESMF ?

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/1651#issuecomment-1463962651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RN4DLIFIDD3LZSWIYLW3NBUXANCNFSM6AAAAAAVV7QPPY. You are receiving this because you were mentioned.

ulmononian commented 1 year ago

What about spack stack itself ? Is it going to have debug versions of MAPL or ESMF ?

if you look at the beta version of the unified environment i shared in the issue description, you will see mapl/2.22.0-debug-esmf-8.3.0b09-debug and esmf/8.3.0b09-debug are both available.

DusanJovic-NOAA commented 1 year ago

We should stop building debug versions of esmf (and mapl).

climbfuji commented 1 year ago

Yes please! And even more important get rid of the annoying I_MPI debug library requirement.

On Mar 10, 2023, at 8:55 AM, Dusan Jovic @.***> wrote:

We should stop building debug versions of esmf (and mapl).

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/1651#issuecomment-1464013624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RMXTHGGYD4NIF25CS3W3NFI7ANCNFSM6AAAAAAVV7QPPY. You are receiving this because you were mentioned.

jkbk2004 commented 1 year ago

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

ulmononian commented 1 year ago

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

do you mean ensure the land DA / noah-mp system builds & runs using the unified environment? or to add a specific land DA env into the unified environment (as is done for global workflow, srw, ufs-wm, etc.)?

jkbk2004 commented 1 year ago

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

do you mean ensure the land DA / noah-mp system builds & runs using the unified environment? or to add a specific land DA env into the unified environment (as is done for global workflow, srw, ufs-wm, etc.)?

land DA/noah-mp build cases need both features of jedi and ufs-wm environments. just build test with land da release branch. But I hope to follow on noah-mp component build along with that. Let me know if we need a quick tag-up.

ulmononian commented 1 year ago

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

do you mean ensure the land DA / noah-mp system builds & runs using the unified environment? or to add a specific land DA env into the unified environment (as is done for global workflow, srw, ufs-wm, etc.)?

land DA/noah-mp build cases need both features of jedi and ufs-wm environments. just build test with land da release branch. But I hope to follow on noah-mp component build along with that. Let me know if we need a quick tag-up.

the unified environment contains all necessary modules for building any of the jedi bundles. for example, in the case of land DA which currently uses the fv3-bundle, one can simply "load" the unified environment (i.e. module use <path/to/ue/core>, load appropriate compiler/mpi, load pertinent modules) and build (in this case, with ecbuild, also included with the unified environment). for example, i built the fv3-bundle using only modules from the beta unified environment on orion here: /work2/noaa/epic-ps/cbook/fv3-bundle.

the land DA system (cloned/built from https://github.com/NOAA-EPIC/land-offline_workflow/tree/release/public-v1.0.0) was run for 2016 case using this UE-built fv3-bundle and a modified landda_orion.intel.lua modulefile (points to unified environment stack & modules; see https://github.com/ulmononian/land-offline_workflow/blob/release/public-v1.0.0/modulefiles/landda_orion.intel.lua) here: [src] /work2/noaa/epic-ps/cbook/landDA/ue_test/spack_fork; [workdir] /work2/noaa/epic-ps/cbook/landDA/ue_test/workdir; [expts] /work2/noaa/epic-ps/cbook/landDA/ue_test/landda_expts.

jkbk2004 commented 1 year ago

@ulmononian there is a re-syncing issue on land da side (https://github.com/NOAA-PSL/land-offline_workflow/issues/29). Is it possible to install a similar version of this spack stack on hera?

jkbk2004 commented 1 year ago

@ulmononian there is a re-syncing issue on land da side (NOAA-PSL/land-offline_workflow#29). Is it possible to install a similar version of this spack stack on hera?

@rhaesung FYI

ulmononian commented 1 year ago

@ulmononian there is a re-syncing issue on land da side (NOAA-PSL/land-offline_workflow#29). Is it possible to install a similar version of this spack stack on hera?

a beta installation of the UE on hera is underway. i will share the path and updated landda_hera.intel.lua file when it is ready for use.

ulmononian commented 1 year ago

@jkbk2004 @rhaesung i installed a beta UE on hera here: /scratch1/NCEPDEV/stmp4/Cameron.Book/sw/spack-stack-1.2.0/envs/unified-env. the land DA system was built against this stack here: /scratch1/NCEPDEV/stmp4/Cameron.Book/landDA_work/land-offline_workflow/build. 2020 and 2016 land DA experiments were run successfully w/ this stack here: /scratch1/NCEPDEV/stmp4/Cameron.Book/landDA_work (see 2020 and 2016 landda_expts and workdirs therein). for consitency, the fv3-bundle was rebuilt with the UE stack here: /scratch1/NCEPDEV/stmp4/Cameron.Book/landDA_work/fv3-bundle.

an updated modulefile for land DA on hera can be found here https://github.com/ulmononian/land-offline_workflow/blob/release/public-v1.0.0/modulefiles/landda_hera.intel.lua.

DeniseWorthen commented 7 months ago

Can we close this issue?

DeniseWorthen commented 6 months ago

Why is this issue still open?

ulmononian commented 6 months ago

this can be closed!