ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 24 forks source link

List of supported platforms for the release #67

Closed climbfuji closed 4 years ago

climbfuji commented 4 years ago

Here is a link to a spreadsheet that lists the supported platforms/compilers and who has access to these:

https://docs.google.com/spreadsheets/d/122uasMhD8aF_s6jUpy7t_Io5JSNGYOmBddtT6kh0dSQ/edit#gid=0

Please use this spreadsheet to indicate the readiness for testing and who is testing on which platform (maybe also indicate success/failures - this has not been set up yet in the spreadsheet). We can also use this GitHub issue to report successes/failures.

rsdunlapiv commented 4 years ago

In discussions with @ligiabernardet on the documentation, we had come to the conclusion that there can be confusion over what is meant by a "supported" platform. With respect to CIME, we are thinking of this in two categories:

There is also the idea of:

Clearing up the terminology will be important. So if a user from a university says that have a linux cluster with intel 19 and intelMPI and dependent libraries, then that would be a supported platform in the sense that if there is an issue with installation we would expect to help them. It would not, however, be a tested platform since no one from the release team has worked on that machine.

A Microsoft Windows desktop is not a supported or tested platform.

ligiabernardet commented 4 years ago

Can someone please add a column to this spreadsheet and indicate which platforms are preconfigured?

climbfuji commented 4 years ago

I will add the column but the entries for many will be TBD - will depend on how far we get in the next weeks.

ligiabernardet commented 4 years ago

Given the definition of preconfigured platform "preconfigured means that CIME has been set up already with machine-specific files, and so the app should work out-of-the-box with no porting steps required", I do not understand why the spreadsheet mentions certain OS as "in progress" wrt preconfiguring. How can MacOS Catalina ever be preconfigured? A user's Mac laptop will not be preconfigured upon purchase; the user will have to configure it. Does the spreadsheet need to be modified so that only specific machines with actual names (e.g., Hera, Cheyenne etc.) can be preconfigured?

rsdunlapiv commented 4 years ago

I agree - a MacOS laptop is never preconfigured - you always have to go through the process of installing NCEPlibs and setting up CIME on that laptop. (Unless the configuration was so standard that CIME would always just work out of the box on MacOS - but that seems very unlikely.)

climbfuji commented 4 years ago

Yes, we need two different categories - preconfigured and supported.

mvertens commented 4 years ago

I think supported means a higher level of porting than preconfigured. Maybe "generic preconfiguration" and "platform preconfiguration" would be a possible distinction - where generic preconfiguration means the templates are there and the user has to modify them accordingly. @Jim Edwards jedwards@ucar.edu - what do you think?

On Mon, Feb 3, 2020 at 3:16 PM Dom Heinzeller notifications@github.com wrote:

Yes, we need two different categories - preconfigured and supported.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=AB4XCE5U6CYSTL646Q7RRRDRBCJVPA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKVTLQI#issuecomment-581645761, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4XCE4FHRX2X743UU2ZW53RBCJVPANCNFSM4KKZRJVQ .

jedwards4b commented 4 years ago

I think that we can create a cime port that will work on macos or linux - the user would need to create inputdata and output directories and set environment variables for them along with the set of env variables in https://github.com/NOAA-EMC/NCEPLIBS/pull/30 however as of this post I am still not able to build NCEPLIBS on a mac.

climbfuji commented 4 years ago

I think you should be able to use the two repos I sent you earlier, the macos rpath update was made (thanks, Kyle). If you don't want to use the NCEPLIBS-external, then you need to install the dependencies by yourself. But I assume (and it would be good if someone tested it) that the problem with NetCDF from NCEPLIBS-external has something to do with my machine setup. Let me know if you need further instructions. Thanks!

jedwards4b commented 4 years ago

@climbfuji I am not yet able to use the two repos you indicated and am having problems well before getting to the netcdf build. The latest problem seems to be in the hdf5 build:

/Users/jedwards/src/NCEPLIBS-external/build/hdf5/src/hdf5-build/CMakeFiles/CheckIncludeFiles/C_HAVE_QUADMATH.c:2:10: fatal error: 'quadmath.h' file not found
#include <quadmath.h>
         ^~~~~~~~~~~~
1 error generated.
climbfuji commented 4 years ago

This is macOS, right? So, the question is which way you chose to install your prerequisites: compiler, MPI library. Your error could be related to using homebrew's mpi library, which uses the default Apple gcc (which is clang with much less functionality than LLVM's clang). Continues below the list ...

  1. Use homebrew to install gcc@9, then set environment variables CC=gcc-9, FC=gfortran-9, CXX=g++-9.
  2. Download and compile openmpi-4.0.2 or mpich-3.3.1 manually with those compilers, install to a place outside of homebrew to not mess it up. Add the mpi bin directory to PATH and the MPI lib directory to LD_LIBRARY_PATH.

Another note I found in my install instructions for Mojave:

# Fix missing header files in /usr/include for macOS Mojave - doesn't exist on Catalina?
open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
jedwards4b commented 4 years ago

Yesterday I was using a second Mac, not the one I first attempted to install on and attempted to follow your instructions to the letter. I humbly submit - if I can't do it it's not ready for public consumption.

climbfuji commented 4 years ago

Please check if you did the last part, which was not included in my instructions (somehow missed that, because my new/test system uses Catalina and not Mojave).

# Fix missing header files in /usr/include for macOS Mojave - doesn't exist on Catalina?
open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
jedwards4b commented 4 years ago

So where are these instructions documented - I've been piecing it together from the email chain so far. Where are the analogous instructions for Linux?

On Tue, Feb 4, 2020 at 8:22 AM Dom Heinzeller notifications@github.com wrote:

Please check if you did the last part, which was not included in my instructions (somehow missed that, because my new/test system uses Catalina and not Mojave).

Fix missing header files in /usr/include for macOS Mojave - doesn't exist on Catalina?

open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=ABOXUGG2JJFYA5I5CB3VV6TRBGB2XA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKYAIAQ#issuecomment-581960706, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGE6NGWDGOYZKF4TCL3RBGB2XANCNFSM4KKZRJVQ .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

climbfuji commented 4 years ago

They are not documented yet because we haven't finalized the instructions. You can either choose to wait until we have tested this out entirely, or you will have to accept that you are a testing buddy who will run into trouble in order to help pointing out flaws in the process (and I very much appreciate having people to find issues in the process). Knowing that we have to add this fix to the missing header files for Mojave system is one example for it. I am sorry, but I just can't work faster.

There are some notes in the README.md of NCEPLIBS-externals, https://github.com/NOAA-EMC/NCEPLIBS-external/tree/master or (this is where I will make the updates today) https://github.com/climbfuji/NCEPLIBS-external/tree/esmf_make_remove_curl_add_wgrib2. It would be super helpful if you could try the "fix missing header files" step and see if this solves the problem you ran into, and note anything you find that is not working in the issues on my fork (while working with my branches).

If you prefer to test the libraries on a better supported platform, please use a generic linux box with the gnu compilers (or intel compilers) in the meanwhile. Thank you for your patience and help!

jedwards4b commented 4 years ago

@climbfuji I don't expect a finished set of instructions - I'm just suggesting a shared doc that lists the steps and that we can modify as we go along. I think we need to get all of the instructions in one place.

climbfuji commented 4 years ago

I think we can work directly on the place where this should go, I started putting the instructions for Catalina there:

https://github.com/NOAA-EMC/NCEPLIBS-external/wiki

Formatting can be improved, but this should get us started.

jedwards4b commented 4 years ago

I can't edit or make comments - why not just use the google doc until we have what we need?

On Tue, Feb 4, 2020 at 10:39 AM Dom Heinzeller notifications@github.com wrote:

I think we can work directly on the place where this should go, I started putting the instructions for Catalina there:

https://github.com/NOAA-EMC/NCEPLIBS-external/wiki

Formatting can be improved, but this should get us started.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=ABOXUGBDMKETBIOZRR5CNJTRBGR3VA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKYQXEQ#issuecomment-582028178, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGBWXXCTG3ENV3RK3Y3RBGR3VANCNFSM4KKZRJVQ .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

climbfuji commented 4 years ago

Now you have write permissions.

rsdunlapiv commented 4 years ago

@climbfuji and @jedwards4b what is the status of the MacOS build, instructions, and CIME testing?

jedwards4b commented 4 years ago

I am now able to run the chgres_cube on the mac but I am getting a dynamic library load error from the model that I haven't been able to figure out:

[1] dyld: Library not loaded: @rpath/libnetcdff.7.dylib
[1]   Referenced from: /Users/jedwards/projects/scratch/SMS_Lh3.C96.GFSv15p2.homebrew_gnu.20200207_135212_sv4w0f/bld/ufs.exe
[1]   Reason: image not found

That @rpath should be /usr/local/ufs-release-v1/lib what is really confusing is that chgres shows this same problem with the netcdf libraries but runs anyway.

climbfuji commented 4 years ago

Does chgres really show

[1] dyld: Library not loaded: @rpath/libnetcdff.7.dylib
[1]   Referenced from: ... /chgres_cube.exe
[1]   Reason: image not found

?

jedwards4b commented 4 years ago

No but chgres does show the @rpath/libnetcdf with otool -L

On Fri, Feb 7, 2020, 16:03 Dom Heinzeller notifications@github.com wrote:

Does chgres really show

[1] dyld: Library not loaded: @rpath/libnetcdff.7.dylib [1] Referenced from: ... /chgres_cube.exe [1] Reason: image not found

?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=ABOXUGHHGKMGZDX5NX2K4OTRBXSFNA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELE5RHY#issuecomment-583653535, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGDOCVUHGOVE3OSSFP3RBXSFNANCNFSM4KKZRJVQ .

climbfuji commented 4 years ago

Ok, but that is what I said two days ago or so. Do otool -l (lowercase L) and search case-insensitive for RPATH and you will find that rpath is set correctly for chgres_cube.exe, but not for the ufs model application. I guess. Please correct me if I am wrong!

jedwards4b commented 4 years ago

So I should look in the model link step for the problem? I'll try again on Monday. Why is this only a problem for netcdf libraries?

On Fri, Feb 7, 2020 at 5:46 PM Dom Heinzeller notifications@github.com wrote:

Ok, but that is what I said two days ago or so. Do otool -l (lowercase L) and search case-insensitive for RPATH and you will find that rpath is set correctly for chgres_cube.exe, but not for the ufs model application. I guess. Please correct me if I am wrong!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=ABOXUGA765B6NJ5PRMXZS73RBX6FNA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELFDHVY#issuecomment-583676887, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGF3YNYXSTW6AWPNU6LRBX6FNANCNFSM4KKZRJVQ .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

climbfuji commented 4 years ago

My assumption is the following: The libraries in NETCDF-external are built as external libraries, i.e. their own CMakeLists.txt control whether macOS rpath functionality is used or not.

The basic question is what you are doing different or if you are doing anything different from the ufs-weather-model build.sh approach. Aren't you calling cmake with the same top-level CMakeLists.txt from ufs-weather-model? I have done this over and over in the last days, but I will try it with the current version of NCEPLIBS-external and NCEPLIBS just to make sure that it still works.

I will also need to take time off this weekend, not sure if I can get anything done before Monday.

climbfuji commented 4 years ago

Here is the modulefile for Cheyenne with GNU:

module load ncarenv/1.3
module load gnu/8.3.0
module load ncarcompilers/0.5.0
module load netcdf/4.7.3
module load mpt/2.19
module load cmake/3.14.4

export CC=mpicc
export FC=mpif90
export CXX=mpicxx

module use -a /glade/p/ral/jntp/GMTB/tools/modulefiles/gnu-8.3.0/mpt-2.19
module load  NCEPlibs/1.0.0alpha01
climbfuji commented 4 years ago

Here is the modulefile for Cheyenne with Intel

module load ncarenv/1.3
module load intel/18.0.5
module load ncarcompilers/0.5.0
module load netcdf/4.7.3
module load mpt/2.19
module load cmake/3.14.4

export CC=mpicc
export FC=mpif90
export CXX=mpicxx

module use -a /glade/p/ral/jntp/GMTB/tools/modulefiles/intel-18.0.5/mpt-2.19
module load  NCEPlibs/1.0.0alpha01
climbfuji commented 4 years ago

Here is the modulefile for Hera with Intel

module load intel/18.0.5.274
module load impi/2018.0.4
module load netcdf/4.7.0
module use -a /scratch1/BMC/gmtb/software/modulefiles/generic
module load cmake/3.16.3

export CC=icc
export CXX=icpc
export FC=ifort

module use -a /scratch1/BMC/gmtb/software/modulefiles/intel-18.0.5.274/impi-2018.0.4
module load NCEPlibs/1.0.0alpha01
climbfuji commented 4 years ago

@jedwards4b FYI I could reproduce the problem on macOS and I am working on it now.

kgerheiser commented 4 years ago

Ran into this too. Not sure why @rpath isn't set when linking to NetCDF.

climbfuji commented 4 years ago

It's got to do with the top-level CMakeLists.txt for the ufs-weather-model. I am close to having a clean solution for it. The reason why it comes up for netCDF only is that netcdf-c's own CMakeLists.txt in NCEPLIBS-external starts this entire business.

climbfuji commented 4 years ago

I have not been able to do it the cmake way. I tried the following in both the ufs-weather-model top-level CMakeLists.txt and in cmake/configure_macosx.gnu.cmake (that's where it should be):

cmake_policy(SET CMP0042 NEW)
# Set RPATH for macOS
set(CMAKE_INSTALL_RPATH "${NETCDF_LIBDIR}")
# Configure RPATH for macOS
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH true)

This works: add the following line to cmake/configure_macosx.gnu.cmake after NETCDF_LIBDIR is defined.

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,-rpath ${NETCDF_LIBDIR}")

However, I don't like this solution at all.

jedwards4b commented 4 years ago

@climbfuji I can confirm that that solves the problem for the cime build. I am now able to run on MacOS.

climbfuji commented 4 years ago

Hmm. Maybe @kgerheiser has a better solution. This is beyond my primitive cmake skills.

kgerheiser commented 4 years ago

I have an idea. I think it might be because of how the model links NetCDF.

Doing this doesn't help:

cmake_policy(SET CMP0042 NEW)
# Set RPATH for macOS
set(CMAKE_INSTALL_RPATH "${NETCDF_LIBDIR}")
# Configure RPATH for macOS
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH true)

That just makes sure that when running make install it appropriately changes the rpath to the install location, but UFS has no install step and even in the build directory the rpath isn't correct.

climbfuji commented 4 years ago

Ah - so maybe using

SET(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE)

could help?

jedwards4b commented 4 years ago

@climbfuji where did you add that call? I just want to confirm that it also works with CIME.

climbfuji commented 4 years ago

Ah - so maybe using

SET(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE)

could help?

So, this didn't work after all. Only the -Wl,-rpath works. I am still looking for a better solution.

jedwards4b commented 4 years ago

In that case I think we just stick with -Wl,-rpath and move on.

climbfuji commented 4 years ago

Yes, I also put it on the backburner.

On Feb 10, 2020, at 11:17 AM, jedwards4b notifications@github.com wrote:

In that case I think we just stick with -Wl,-rpath and move on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=AB5C2RISAIQ3K77RB26OK7DRCGK3PA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELJROXI#issuecomment-584259421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RIRRYI6VXD3QYDHX53RCGK3PANCNFSM4KKZRJVQ.

climbfuji commented 4 years ago

This is addressed in https://github.com/ufs-community/ufs-weather-model/pull/54 (not yet merged).

ceceliadid commented 4 years ago

I put the description of pre-configured and supported platforms in one place on the ufs top-level wiki, here: https://github.com/ufs-community/ufs/wiki/Pre-Configured-and-Supported-Platforms and have pointed all the wikis to this place instead of repeating the info. If there are updates or the definitions need improving I would suggest doing it here.

climbfuji commented 4 years ago

Thanks, we need to update this list at some point to match what we discussed yesterday ...

On Feb 11, 2020, at 9:14 AM, ceceliadid notifications@github.com wrote:

I put the description of pre-configured and supported platforms in one place on the ufs top-level wiki, here: https://github.com/ufs-community/ufs/wiki/Pre-Configured-and-Supported-Platforms https://github.com/ufs-community/ufs/wiki/Pre-Configured-and-Supported-Platforms and have pointed all the wikis to this place instead of repeating the info. If there are updates or the definitions need improving I would suggest doing it here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/67?email_source=notifications&email_token=AB5C2RPWALZU5MZOUTKOACTRCLFFNA5CNFSM4KKZRJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELNATHQ#issuecomment-584714654, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RPFJBVXABUJW6MQVNLRCLFFNANCNFSM4KKZRJVQ.

ceceliadid commented 4 years ago

I couldn't make that call so I don't know what the changes were - will need help with that.

climbfuji commented 4 years ago

See here: https://docs.google.com/spreadsheets/d/122uasMhD8aF_s6jUpy7t_Io5JSNGYOmBddtT6kh0dSQ/edit#gid=0 - note the definition of the tiers in the yellow boxes at the bottom. Note also that a system can only be a preconfigured platform if the libraries are installed in a shared space, not a user directory. This disqualifies stampede unless the sysadmins are willing to make this move.

ceceliadid commented 4 years ago

I see the point for the general community. Since we have a Stampede allocation specifically for GST tests for the release, can we install the required libs in that user directory and consider it preconfigured for GST folks?

rsdunlapiv commented 4 years ago

@ceceliadid for the description of pre-configured and supported, it's not just that NCEPLibs and NCEPLibs-external are either pre-installed or expected to work. The model itself is expected to compile and work and the workflow is expected to work as well.

arunchawla-NOAA commented 4 years ago

@ceceliadid @rsdunlapiv @climbfuji based on our discussions these were the 4 levels of testing

Tier 1 : Pre configured -- Libraries installed in a central place, model builds and CIME works end to end

Tier 2: Supported -- Libraries build, model builds and CIME works end to end, just libraries not stored in a central space

Tier 3: -- Libraries build, model builds, testing done only with the simple case

Tier 4: -- Libraries and model build. No further testing

Hope this helps.

ceceliadid commented 4 years ago

@arunchawla-NOAA @rsdunlapiv @climbfuji Thanks. There is a consistency issue here that Ligia has raised before. The criteria you are using for the Tier 1/Tier 2... ladder is different than the criteria that is in current documents.

Tier 1/Tier 2 etc. is already defined by a regression test policy that has documents at the UFS, app, and model level - see for example at the UFS level: https://github.com/ufs-community/ufs/wiki/Regression-Test-Policy-for-UFS-Platforms-and-Compilers At the UFS level there will be different models and workflows, so CIME working vs testing done just with the WM should not be part of the definition of the tiers.

We do have documents that were started that interpret these definitions for the mrw app and the weather model, but those definitions are still in terms of regression testing policies, see: https://github.com/ufs-community/ufs-mrweather-app/wiki/Regression-Test-Policy-for-MR-Weather-App-Platforms-and-Compilers https://github.com/ufs-community/ufs-weather-model/wiki/Regression-Test-Policy-for-Weather-Model-Platforms-and-Compilers

I think the first main decision that needs to be made is how you want to talk about the tiers in the regression test policy vs the tiers you've defined, especially for the MRW app and the weather model. Do you want them to always be the same? So, for example for the MRW app, would your tier 2 always mean that regression tests are conducted after the commits AND that libs build, CIME works to the end, etc? Not sure it makes sense to always connect those.

It looks like you are also introducing a 4th tier here that the original definition of the regression test tiers did not have. Do you want to map that onto the regression test policy somehow?

If you don't want to combine the definitions, you can NOT use tier 1, tier 2 etc. and just call the ladder pre-configured, supported, etc. If you decide to do that an immediate question would be whether you want a set of definitions for these levels that is UFS-wide as well as specific to the MRW app.