trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

Create a SEMSDevEnv.cmake file to automatically use loaded SEMS dev env #158

Closed bartlettroscoe closed 7 years ago

bartlettroscoe commented 8 years ago

Next Action Status:

Working and providing value as part of new checkin-test-sems.sh script and CI build (see #482)

CC list: @rppawlo, @bathmatt, @jgfouca, @jwillenbring, @gdsjaar, @trilinos/framework

Blocking: #410, #370

Description:

This story will be to create a SEMSDevEnv.cmake file that once included (with -DTrilinos_CONFIGURE_OPTIONS_FILES=/SEMSDevEnv.cmake), then TriBITS will automatically pick up the right compilers and TPL locations.

In addition, it would be desirable for the Trilinos configure to automatically pick up the loaded SEAMS env (like is done on the ATTB machines using the ATTB_ENV env var, see #172).

In addition to just providing the SEMSDevEnv.cmake module, this story will also scope out what might be useful for a standard Trilinos dev env. However, a new story will be created to refine what a new expanded Trilinos Primary Tested build of TPLs and Packages will look like.

List of TPLs and other requirements needed for a standard Trilinos CI build:

This following list is the current consensus for this the standard Trilinos CI build (this list is updated as consensus changes).

Tasks:

  1. Get opinions on what TPLs should be included in the SEMSDevEnv.cmake module (targeting a standard Trilinos pre-push CI build) [Done]
  2. Write out SEMSDevEnv.cmake to automatically set up Compilers TPLs for a given loaded SEMS env (using module load <avail>) by reading env vars starting with SEMS_. (see below) [Done]
  3. Create a load_sems_dev_env.sh module that can be called as source load_sems_dev_env.sh [<compiler-and-version>] [<openmpi-and-version>], for example source load_sems_dev_env.sh gcc/4.9.2 openmpi/1.10.1 (where <compiler-and-version> and <openmpi-and-version> are given defaults if not provided). (see below) [Done]
  4. Test out on full shared lib MPI and serial builds of Trilinos with all Secondary Tested packages enabled that can be given this set of TPLs. (but forgot to enable Scott and ParMETIS TPLs, see below) [Done]
  5. Automatically load SEMSDevEnv.cmake if it is detected that the SEMS dev env is loaded. (see below) [Done]
  6. Enable Scotch and ParMETIS TPLs in shared lib build and test with packages that support them ... Zoltan and Zoltan2 tests fail as expected (see below and #475 and #476) [Done]
  7. Test and get working all-static builds (i.e. build static libs and use static TPL libs) ... see commit c334dd6 [Done]
  8. Investigate issues with Scotch and ParMETIS ... SEMS provides inconsistent 32-bit Scotch and 64-bit ParMETIS (see below) [Done]
  9. Try to enable SuperLU 4.3 (see below) [Done]
  10. Finish documentation on Trilinos GitHub Wiki (see below) [Done]
  11. Have documentation and implementation reviewed (see a, b, c, d, e, and f) ... Will fix any problems if they come up. [Done]
jwillenbring commented 8 years ago

@dmvigi should be CC'ed on this too.

bathmatt commented 8 years ago

@bartlettroscoe Do you have a prelim file we can test on shiller?

bartlettroscoe commented 8 years ago

@bathmatt, what is a "prelim file"?

nmhamster commented 8 years ago

Preliminary file he can test I think.

bartlettroscoe commented 8 years ago

From: Bartlett, Roscoe A Sent: Monday, April 04, 2016 7:11 PM To: Trilinos Framework Subject: Access to machine with SEMS Dev Env mounted?

Hello Trilinos Framework,

Is there some machine that has the SEAMS Dev Env mounted that is not being constantly hammered where I could get an account? I have some changes that I need to test that might impact a lot of Secondary Tested Trilinos packages such as:

 https://github.com/trilinos/Trilinos/pull/265
https://github.com/TriBITSPub/TriBITS/issues/56 

I can’t effectively test all of this and push unless I have most of the important TPLs built for packages like STK and SEACAS.

I would not hammer this machine. I could use very few processes. I would only use it for the final configure/build/test/push (unless something breaks before the push then I will need to fix it).

This is also a chance for me to put together a standard Trilinos configure for SEMS:

https://github.com/trilinos/Trilinos/issues/158

This could lead to a standard checkin-test-sems.sh script for a standard CI build for Trilinos.

Cheers,

-Ross

jgfouca commented 8 years ago

@bartlettroscoe , it's quite easy to mount the sems TPLs, so any COE RHEL machine on the SON or SRN should work.

bartlettroscoe commented 8 years ago

it's quite easy to mount the sems TPLs, so any COE RHEL machine on the SON or SRN should work.

Don't you need sudo to mount an NFS drive? I don't have sudo on any SNL machine. The only machines I have accounts on are the CEE server ceesrv02 and several of the ATTB machines. I don't think any of these mount the SEMS dev env partitions, do they? I don't have my own Linux machine yet (and it does not seem trivial to get one).

nmhamster commented 8 years ago

@bartlettroscoe we don't mount SEMS on ATTB for reasons we outlined relating to TPLs not working on some architectures and level of tuning/modification and support in the environment (where many things are non-standard until we have had time to work with vendors to find appropriate ways to integrate these into their platforms and optimize them). Have you checked SNL re-app for Linux machines, there are usually workstations there if you just need a builder box.

bartlettroscoe commented 8 years ago

we don't mount SEMS on ATTB for reasons we outlined relating to TPLs not working on some architectures and level of tuning/modification and support in the environment

Given that shepard is supposed to be a fairly generic machine, could the SEMS dev env partition be mounted there?

Have you checked SNL re-app for Linux machines, there are usually workstations there if you just need a builder box.

I have not. But I am little leery of spending a lot of time setting up an old slow workstation that could die at any minute.

bathmatt commented 8 years ago

There should be funds to buy a reasonable machine, they aren't that much. I would hammer hanson/shiller honestly. If they are ovreloaded we need to buy more of them. You can build on the backend nodes as well. That's what our jenkins scripts do (As I understnad it)

On Tue, Apr 5, 2016 at 7:43 AM, Roscoe A. Bartlett <notifications@github.com

wrote:

we don't mount SEMS on ATTB for reasons we outlined relating to TPLs not working on some architectures and level of tuning/modification and support in the environment

Given that shepard is supposed to be a fairly generic machine, could the SEMS dev env partition be mounted there?

Have you checked SNL re-app for Linux machines, there are usually workstations there if you just need a builder box.

I have not. But I am little leery of spending a lot of time setting up an old slow workstation that could die at any minute.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/trilinos/Trilinos/issues/158#issuecomment-205811002

jgfouca commented 8 years ago

@bartlettroscoe , yes, you need root to mount, but you can just ask a sysadmin to do it.

bartlettroscoe commented 8 years ago

There should be funds to buy a reasonable machine, they aren't that much.

Yea, I guess I need to just go ahead and pull the trigger on getting a new COE RedHand linux machine just for myself to do this type of stuff on. I have already been given the green light to do so. I was just trying to see if it was possible to create a productive dev env at SNL without purchasing my own Linux box (or purchasing a very expensive CEE machine).

I would hammer hanson/shiller honestly.

Right. For development work related to advanced architectures, we need to use the ATTB machines. That is the purpose of #172.

But the issue is that we desperately need a standard CI env for Trilinos. We can't use the ATTB machines for that. The simplest way to build a uniform CI env for Trilinos is based of the SEMS dev env (see).

bartlettroscoe commented 8 years ago

It looks like I have access to the machine muir (thanks for pointing that out Brent!) that NFS mounts the SEMS dev env under under /projects/. It looks like it is already set up when you login to show modules under /projects/modulefiles/ shown with module avail. Now I just need to figure out which modules should be loaded to effectively test Trilinos. It looks like there are different builds with GCC (4.4.7, 4.7.2, 4.8.4, 4.9.2, 5.1.0) and Intel (14.0.4, 15.0.2, 16.0.1) and OpenMPI (1.6.5, 1.8.7, 1.10.1) for different versions of the TPLs for many permutations of compilers and OpenMPI for the (Trilinos) TPLs boost, hdf5, netcdf, parmetis, scotch, qd, superlu, and zlib. However, it looks like several TPLs rquired to build all of STK and SEACAS are not there. For example, if I configure Trilinos (on any machine) with:

$ cmake -DTPL_ENABLE_MPI=ON \
   -DTrilinos_ENABLE_STK=ON \
   -DTrilinos_ENABLE_SEACAS=ON \
  ../../../Trilinos

You see the enabled TPLs:

It looks like the X11 and Matio TPL is required by SEACAS:

-- Setting TPL_ENABLE_X11=ON because it is required by the enabled package SEACASSVDI
-- Setting TPL_ENABLE_Matio=ON because it is required by the enabled package SEACASExo2mat

Are these TPLs really important for testing Trilinos functionality downstream from SEACAS?

There are a bunch of optional TPLs that are not enabled but yet I know some people feel are important. If the remaining 88 TPLs:

how many should really be present and enabled for a solid pre-push test of Trilinos.

I know that SuperLUDist, SuperLU, HYPRE and PETSC are important for the IDEAS Productivity project. What about the other sparse-direct TPLs like UMFPACK and CSparse? I, for one, would really like the TPL BinUtils to be present. That is used for creating backtraces with exception handling.

Can we get a solid pre-push test of Trilinos without all of these other TPLs?

I will ask the Trilinos developers. I should also ask Trilinos customers what TPLs they enable with Trilinos.

bartlettroscoe commented 8 years ago

Note that BLAS and LAPACK are not listed in the SEMS modules. From looking at the automated builds on muri on CDash at:

http://testing.sandia.gov/cdash/viewConfigure.php?buildid=2408598

It would seem these are found in the base Linux COE packages:

-- TPL_BLAS_LIBRARIES='/usr/lib64/libblas.so'
-- TPL_LAPACK_LIBRARIES='/usr/lib64/liblapack.so'
crtrott commented 8 years ago

Just to clarify, this doesn't mean you want to do only single compiler testing for the nightlies right?

That said for standard push right now: gcc 4.8.4, openmpi 1.8.7

Nightlies need additionaly gcc 4.7.2, 4.9.2, 5.1.0, intel 15, 16 and openmpi 1.10

openmpi 1.6 can be retired in my opinion, that is pretty old by now. On the TPL front minimum: boost, hdf5, netcdf, and zlib

The superlu version provided by SEMS is broken last I checked, and nobody cared enough to fix it since one can simply not use it in Trilinos.

nmhamster commented 8 years ago

@bartlettroscoe the real aim of this work shouldn't be to do this for SEMS environments per-se, it should be to ensure we have a high quality, well tested framework for execution on capacity- and capability- class production computing environments. Note that on these production machines SEMS won't be the environment because, instead, it will be supplied by vendors who have to optimize and support it (this particularly true for machines like Cray and IBM who very carefully patch and engineering an environment for their machines). The real value of SEMS is that it brings this concept down to the workstation environment for our developers and includes local support. To that end, these tests should be oriented towards configurations that will be on our production environments where we can do that/replicate the combinations to be close enough. This means running multiple of these combinations (unfortunately) to replicate the production computing systems. As a rough wag to get get started, we would need an OpenMPI 1.10 series test with GCC 4.8 and 4.9 for POWER systems (perhaps including CUDA) and then an MPICH-based environment with GCC 4.9/Intel 16.X for a machine like Trinity (note Intel relies on GCC for header files so we must be careful to select this appropriately). The SIERRA folks will also need to run on older machines with Intel 15.X and older GCC (I would suggest 4.7 for this purpose).

bartlettroscoe commented 8 years ago

Just to clarify, this doesn't mean you want to do only single compiler testing for the nightlies right?

No, this is just the standard pre-push CI build. This will not affect what gets tested post-push. And we want to make the pre-push CI build fast and we want to focus on what best protects other developers doing their work. So we could likely get away with just a single MPI build of Trilinos with ETI turned on, no complex or float, etc. We want this to cover a good bit but be as fast as possible.

crtrott commented 8 years ago

Ok, in that case as I said GCC 4.8.4 with OpenMPI 1.8.7 and I would very, very strongly advocate for enabling OpenMP because I believe by default we must exercise the threaded code path.

bartlettroscoe commented 8 years ago

As a rough wag to get get started, we would need an OpenMPI 1.10 series test with GCC 4.8 and 4.9 for POWER systems (perhaps including CUDA) and then an MPICH-based environment with GCC 4.9/Intel 16.X for a machine like Trinity (note Intel relies on GCC for header files so we must be careful to select this appropriately). The SIERRA folks will also need to run on older machines with Intel 15.X and older GCC (I would suggest 4.7 for this purpose).

We are not looking for a comprehensive set of builds. The post-push builds and the usage of the 'develop'/'master' branch workflow will take care of protecting these customers. What we are going for is the best single (relatively fast) pre-push CI build that we can put together.

kddevin commented 8 years ago

@bartlettroscoe wrote: 3) What TPLs should be enabled? (The SEMS Dev Env assumes blas and lapack are already on the system and provides boost, hdf5, netcdf, parmetis, scotch, qd, superlu, and zlib) 4) Of the TPLs that are enabled, what versions of TPLs should considered standard?

ParMETIS v4.0.3 or later, built with 32-bit index types Scotch v6.0.3 or later, built with 32-bit index types

nmhamster commented 8 years ago

@kddevin @bartlettroscoe I do not think we should assume BLAS and LAPACK are on the system. We should have a minimum NetLib install in the SEMS filesystem.

bartlettroscoe commented 8 years ago

I do not think we should assume BLAS and LAPACK are on the system. We should have a minimum NetLib install in the SEMS filesystem.

I agree. The main concern that I have is that the BLAS and LAPACK that get used between Macs and the Linux machines may be different enough to cause tests that pass on one platform to fail on another platform. What we are going for here is uniformity.

bartlettroscoe commented 8 years ago

@gdsjaar, for the purpose of running the automated Trilinos test suite, do these parameters really matter? That is our only concern with this effort. We don't expect people to be running large calculations with this build env. We are not going to expect the the Nalu and Drekar test suites are going to be running with this build env.

The plan is to have a more comprehensive set of post-push (Nightly) builds that run on the CEE, ATTB, and other platforms that target particular customer usage of Trilinos. Then, for a carefully selected set of packages for these builds, if they are all clean, then we will merge from the 'develop' branch to the 'master' branch.

Does that make sense?


From: trilinos-framework-bounces@software.sandia.gov [mailto:trilinos-framework-bounces@software.sandia.gov] On Behalf Of Sjaardema, Gregory D Sent: Friday, April 08, 2016 2:32 PM To: Siefert, Christopher; Hammond, Simon David (-EXP) Cc: Trilinos Framework; sandia-trilinos-developers@software.sandia.gov Subject: Re: [Trilinos-Framework] [Sandia-trilinos-developers] GCC, OpenMPI and TPLs for standard Trilinos CI env based on SEMS Dev Env?

Note that the NC_MAX_VAR_DIMS values are not important for Exodus (which is the reason for changing the other values). The only values that must be changed are NC_MAX_DIMS and NC_MAX_VARS and they should be to at least the values shown. So, Nalu and Drekar can and should use the same NetCDF libraries.

The library should, if at all possible, be compiled with —enable-netcdf4 as that provides the superset of capabilities that are needed — It can then support both “very large” and “complex” models. The —enable-pnetcdf option is not really associated with “large mesh” or “small mesh”, but is instead an additional parallel-io method used when using the “auto-join” option of the Ioss library.

Whether to build with parallel enabled is another variable and most pre-installed versions of hdf5 and netcdf will not have this enabled, but for a parallel build of Trilinos, it should be enabled in netcdf and hdf5. This provides the auto-decomposition and auto-join (1->N , N->1) capabilities useable by codes using the IOSS library.

As far as I am aware, all codes should be able to use the same NetCDF and HDF5 libraries as long as we pick the superset (enable-netcdf4, enable-parallel, enable-pnetcdf). I can help resolve any issues where it may seem that multiple installations are needed (other than a serial and parallel trilinos build).

I also have some FindNetCDF.cmake files used in the standalone SEACAS Cmake build that can detect most of these settings that should be adopted for Trilinos.

(NOTE: I have contacted the netcdf developers with a couple ways of eliminating the need to change these values and they have indicated that they will not due to compatibility issues even though they have been very receptive of other changes we have supplied and requested. The ultimate solution may be to have our own version of the NetCDF library supplied in the SEACAS package).

rppawlo commented 8 years ago

Actually, as part of my pre-push testing, I always have extra builds that test against Drekar and Charon2 since we are usually making changes to Trilinos libraries that these codes depend on. It would be helpful if the SEACAS TPLs could be used with these codes. Drekar and Charon2 are built as trilinos/tribits extra packages and are built and tested directly as part of a Trilinos configure/build.

Roger

On 04/08/2016 04:18 PM, Roscoe A. Bartlett wrote:

@gdsjaar https://github.com/gdsjaar, for the purpose of running the automated Trilinos test suite, do these parameters really matter? That is our only concern with this effort. We don't expect people to be running large calculations with this build env. We are not going to expect the the Nalu and Drekar test suites are going to be running with this build env.

The plan is to have a more comprehensive set of post-push (Nightly) builds that run on the CEE, ATTB, and other platforms that target particular customer usage of Trilinos. Then, for a carefully selected set of packages for these builds, if they are all clean, then we will merge from the 'develop' branch to the 'master' branch.

Does that make sense?


From: trilinos-framework-bounces@software.sandia.gov mailto:trilinos-framework-bounces@software.sandia.gov [mailto:trilinos-framework-bounces@software.sandia.gov mailto:trilinos-framework-bounces@software.sandia.gov] On Behalf Of Sjaardema, Gregory D Sent: Friday, April 08, 2016 2:32 PM To: Siefert, Christopher; Hammond, Simon David (-EXP) Cc: Trilinos Framework; sandia-trilinos-developers@software.sandia.gov mailto:sandia-trilinos-developers@software.sandia.gov Subject: Re: [Trilinos-Framework] [Sandia-trilinos-developers] GCC, OpenMPI and TPLs for standard Trilinos CI env based on SEMS Dev Env?

Note that the NC_MAX_VAR_DIMS values are not important for Exodus (which is the reason for changing the other values). The only values that must be changed are NC_MAX_DIMS and NC_MAX_VARS and they should be to at least the values shown. So, Nalu and Drekar can and should use the same NetCDF libraries.

The library should, if at all possible, be compiled with —enable-netcdf4 as that provides the superset of capabilities that are needed — It can then support both “very large” and “complex” models. The —enable-pnetcdf option is not really associated with “large mesh” or “small mesh”, but is instead an additional parallel-io method used when using the “auto-join” option of the Ioss library.

Whether to build with parallel enabled is another variable and most pre-installed versions of hdf5 and netcdf will not have this enabled, but for a parallel build of Trilinos, it should be enabled in netcdf and hdf5. This provides the auto-decomposition and auto-join (1->N , N->1) capabilities useable by codes using the IOSS library.

As far as I am aware, all codes should be able to use the same NetCDF and HDF5 libraries as long as we pick the superset (enable-netcdf4, enable-parallel, enable-pnetcdf). I can help resolve any issues where it may seem that multiple installations are needed (other than a serial and parallel trilinos build).

I also have some FindNetCDF.cmake files used in the standalone SEACAS Cmake build that can detect most of these settings that should be adopted for Trilinos.

(NOTE: I have contacted the netcdf developers with a couple ways of eliminating the need to change these values and they have indicated that they will /not/ due to compatibility issues even though they have been very receptive of other changes we have supplied and requested. The ultimate solution may be to have our own version of the NetCDF library supplied in the SEACAS package).

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/trilinos/Trilinos/issues/158#issuecomment-207587288

bartlettroscoe commented 8 years ago

Actually, as part of my pre-push testing, I always have extra builds that test against Drekar and Charon2 since we are usually making changes to Trilinos libraries that these codes depend on.

Makes sense. Does the SEAMS build of NetCDF 4.3.2 allow all of the Drekar and Charon2 tests to pass?

nmhamster commented 8 years ago

I agree. The main concern that I have is that the BLAS and LAPACK that get used between Macs and the Linux machines may be different enough to cause tests that pass on one platform to fail on another platform. What we are going for here is uniformity.

SEMS should be loading the modules on Mac OSX from a different path so this shouldn't be an issue should it? Maybe I'm not understanding the problem with that configuration.

bartlettroscoe commented 8 years ago

SEMS should be loading the modules on Mac OSX from a different path so this shouldn't be an issue should it? Maybe I'm not understanding the problem with that configuration.

What I means is, are the default versions of BLAS and LAPACK on the Linux COE vs. Mac different enough (in the source code and/or the way they are built) that they might cause tests that pass on the Linux COE to fail on the Mac and visa versa? That is my concern. If they are the default system BLAS and LAPACK, CMake FIND_LIBRARY() should find them okay.

Does that make sense?

bartlettroscoe commented 8 years ago

Capturing Mike's email related to this ...

A Docker container would provide the broadest and most uniform pre-push CI env between any Linux, Mac, or Windows machine, inside or outside of Sandia. If this student could get with the SEMS team and figure out how to run the SEMS compiler and TPL builder inside of a Docker container, then that would solve all of our problems (assuming that everyone is willing to install the Docker support software and using the Docker container to test and push is easy). For example, could the ATTB and CEE machines install the Docker support software so that we could run this container on those machines too?

@maherou, would you student have time to do this? Should we create a new Trilinos GitHub Issue for this or is that not feasible? Anyway, we will look forward to your student's webinar!


-----Original Message----- From: trilinos-framework-bounces@software.sandia.gov [mailto:trilinos- framework-bounces@software.sandia.gov] On Behalf Of Heroux, Michael A Sent: Friday, April 08, 2016 2:12 PM To: Hammond, Simon David (-EXP); Siefert, Christopher Cc: Trilinos Framework; sandia-trilinos-developers@software.sandia.gov Subject: Re: [Trilinos-Framework] [Sandia-trilinos-developers] GCC, OpenMPI and TPLs for standard Trilinos CI env based on SEMS Dev Env?

Along the lines of a workstation image, I have a student finishing up his undergraduate honors project using Docker for Trilinos. He has observed excellent performance (essentially no performance loss, and even sometimes improvement) comparing a native installation of Trilinos on a cluster vs a Docker version on the same cluster, using up to 48 MPI processes.

I foresee that a Docker container of Trilinos can become a, or maybe the, way we provide pre-built versions of Trilinos for generic environments. We already use it for distributing the Trilinos Web tutorial.

Given the recent announcement that Docker will have native Windows and Mac apps, there is growing value in this approach.

The student, Sean Deal from St. John¹s, MN, will give a webinar on his work prior to finishing school.

Just FYI.

Mike

rppawlo commented 8 years ago

yes. I believe multiple drekar developers are using the sems builds. I will launch a test to verify as well.

On 04/08/2016 04:40 PM, Roscoe A. Bartlett wrote:

Actually, as part of my pre-push testing, I always have extra
builds that test against Drekar and Charon2 since we are usually
making changes to Trilinos libraries that these codes depend on.

Makes sense. Does the SEAMS build of NetCDF 4.3.2 allow all of the Drekar and Charon2 tests to pass?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/trilinos/Trilinos/issues/158#issuecomment-207594869

bartlettroscoe commented 8 years ago

ParMETIS v4.0.3 or later, built with 32-bit index types Scotch v6.0.3 or later, built with 32-bit index types

SEMS provides ParMETIS 4.0.3 and Scotch 6.0.3 for 24 different combinations of builds.

@jgfouca, are the SEMS builds of ParMETIS 4.0.3 and Scotch 6.0.3 built with 32-bit index types?

I will collect the list of TPLs in the description field above ...

jgfouca commented 8 years ago

@bartlettroscoe , should be 64 bit. I see this in our configuration:

sed <SEDINPLACE> -e 's/IDXTYPEWIDTH 32/IDXTYPEWIDTH 64/g'
bartlettroscoe commented 8 years ago

should be 64 bit. I see this in our configuration:

@kddevin, does this mean that the ParMETIS and Scotch built by SEMS is unacceptable for usage by Zoltan?

@jgfouca, who are the customers of ParMETIS and Scotch that are using a 64 bit index type? Are people using ParMETIS and Scotch outside of Zoltan? I think that SuperLUDist uses ParMETIS but SuperLUDist is not in the list of SEMS TPLs.

kddevin commented 8 years ago

Zoltan's tests use 32-bit ParMETIS and Scotch. Tests can fail with 64-bit, due to differences in execution path with ParMETIS and Scotch; the partitions returned differ between 32-bit and 64-bit. See #225. If this environment is to be used for testing, we'd prefer 32-bit unless someone else needs 64-bit. If 64-bit is required, we can add 64-bit answers to the repo to allow the tests to pass.

jgfouca commented 8 years ago

@kddevin , someone asked SEMS to make it 64-bit. @bmpersc , can you think of who this might have been? I've search my email and came up empty.

bartlettroscoe commented 8 years ago

someone asked SEMS to make it 64-bit ... can you think of who this might have been?

That is why we need issue tracking :-) Is there a version control repo for the SEMS TPL build scripts? That is another place where you might find requirements related info.

gdsjaar commented 8 years ago

I wasn’t the one who asked initially, but for use in SEACAS, the 64-bit idxtype in ParMetis is recommended/required. ..Greg

"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: "Roscoe A. Bartlett" notifications@github.com<mailto:notifications@github.com> Reply-To: trilinos/Trilinos reply@reply.github.com<mailto:reply@reply.github.com> Date: Saturday, April 9, 2016 at 7:39 AM To: trilinos/Trilinos Trilinos@noreply.github.com<mailto:Trilinos@noreply.github.com> Cc: "Sjaardema, Gregory D" gdsjaar@sandia.gov<mailto:gdsjaar@sandia.gov> Subject: [EXTERNAL] Re: [trilinos/Trilinos] Create a SEMSDevEnv.cmake file to automatically use loaded SEMS dev env (#158)

someone asked SEMS to make it 64-bit ... can you think of who this might have been?

That is why we need issue tracking :-) Is there a version control repo for the SEMS TPL build scripts? That is another place where you might find requirements related info.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/trilinos/Trilinos/issues/158#issuecomment-207790458

gdsjaar commented 8 years ago

The IOSS library in SEACAS uses the ParMETIS library outside of Zoltan and recommends/requires the use of the 64-bit index type. ..Greg

"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: "Roscoe A. Bartlett" notifications@github.com<mailto:notifications@github.com> Reply-To: trilinos/Trilinos reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, April 8, 2016 at 3:29 PM To: trilinos/Trilinos Trilinos@noreply.github.com<mailto:Trilinos@noreply.github.com> Cc: "Sjaardema, Gregory D" gdsjaar@sandia.gov<mailto:gdsjaar@sandia.gov> Subject: [EXTERNAL] Re: [trilinos/Trilinos] Create a SEMSDevEnv.cmake file to automatically use loaded SEMS dev env (#158)

should be 64 bit. I see this in our configuration:

@kddevinhttps://github.com/kddevin, does this mean that the ParMETIS and Scotch built by SEMS is unacceptable for usage by Zoltan?

@jgfoucahttps://github.com/jgfouca, who are the customers of ParMETIS and Scotch that are using a 64 bit index type? Are people using ParMETIS and Scotch outside of Zoltan? I think that SuperLUDist uses ParMETIS but SuperLUDist is not in the list of SEMS TPLs.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/trilinos/Trilinos/issues/158#issuecomment-207612769

gdsjaar commented 8 years ago

The problem I see with those below is that netcdf-4.3.2 is a relatively old version of netcdf. I would prefer a much newer version, 4.4.0 is preferred; 4.3.3.1 is allowable. The reason for this is that netcdf has introduced some improvements in later versions that make it easier to determine what capabilities are supported (this is via the netcdf-meta.h include file and others). With the many options that netcdf can be built with, the ability to query the build options via the nc-config and netcdf-meta.h is needed to ensure that the client knows the capabilities of the library.

..Greg

"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: "Roscoe A. Bartlett" notifications@github.com<mailto:notifications@github.com> Reply-To: trilinos/Trilinos reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, April 8, 2016 at 2:36 PM To: trilinos/Trilinos Trilinos@noreply.github.com<mailto:Trilinos@noreply.github.com> Cc: "Sjaardema, Gregory D" gdsjaar@sandia.gov<mailto:gdsjaar@sandia.gov> Subject: [EXTERNAL] Re: [trilinos/Trilinos] Create a SEMSDevEnv.cmake file to automatically use loaded SEMS dev env (#158)

I have a question/observation about SEMS. I am seeing that some TPLs have upwards of 45 different builds, such as for NetCDF:

netcdf/4.3.2/clang/3.5.2/openmpi/1.10.1 netcdf/4.3.2/clang/3.5.2/openmpi/1.6.5 netcdf/4.3.2/clang/3.5.2/openmpi/1.8.7 netcdf/4.3.2/clang/3.6.1/base netcdf/4.3.2/clang/3.6.1/openmpi/1.10.1 netcdf/4.3.2/clang/3.6.1/openmpi/1.6.5 netcdf/4.3.2/clang/3.6.1/openmpi/1.8.7 netcdf/4.3.2/gcc/4.4.7/base netcdf/4.3.2/gcc/4.4.7/openmpi/1.10.1 netcdf/4.3.2/gcc/4.4.7/openmpi/1.6.5 netcdf/4.3.2/gcc/4.4.7/openmpi/1.8.7 netcdf/4.3.2/gcc/4.7.2/base netcdf/4.3.2/gcc/4.7.2/openmpi/1.10.1 netcdf/4.3.2/gcc/4.7.2/openmpi/1.6.5 netcdf/4.3.2/gcc/4.7.2/openmpi/1.8.7 netcdf/4.3.2/gcc/4.8.4/base netcdf/4.3.2/gcc/4.8.4/openmpi/1.10.1 netcdf/4.3.2/gcc/4.8.4/openmpi/1.6.5 netcdf/4.3.2/gcc/4.8.4/openmpi/1.8.7 netcdf/4.3.2/gcc/4.9.2/base netcdf/4.3.2/gcc/4.9.2/openmpi/1.10.1 netcdf/4.3.2/gcc/4.9.2/openmpi/1.6.5 netcdf/4.3.2/gcc/4.9.2/openmpi/1.8.7 netcdf/4.3.2/gcc/4.9.3/base netcdf/4.3.2/gcc/4.9.3/openmpi/1.10.1 netcdf/4.3.2/gcc/4.9.3/openmpi/1.6.5 netcdf/4.3.2/gcc/4.9.3/openmpi/1.8.7 netcdf/4.3.2/gcc/5.1.0/base netcdf/4.3.2/gcc/5.1.0/openmpi/1.10.1 netcdf/4.3.2/gcc/5.1.0/openmpi/1.6.5 netcdf/4.3.2/gcc/5.1.0/openmpi/1.8.7 netcdf/4.3.2/intel/14.0.4/base netcdf/4.3.2/intel/14.0.4/openmpi/1.10.1 netcdf/4.3.2/intel/14.0.4/openmpi/1.6.5 netcdf/4.3.2/intel/14.0.4/openmpi/1.8.7 netcdf/4.3.2/intel/15.0.2/base netcdf/4.3.2/intel/15.0.2/openmpi/1.10.1 netcdf/4.3.2/intel/15.0.2/openmpi/1.6.5 netcdf/4.3.2/intel/15.0.2/openmpi/1.8.7 netcdf/4.3.2/intel/16.0.1/base netcdf/4.3.2/intel/16.0.1/intelmpi/5.1.2 netcdf/4.3.2/intel/16.0.1/openmpi/1.10.1 netcdf/4.3.2/intel/16.0.1/openmpi/1.6.5 netcdf/4.3.2/intel/16.0.1/openmpi/1.8.7

Are all of these permutations really needed? Is the union of all the builds by all of the customer codes of Trillinos coverall of these permutations?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/trilinos/Trilinos/issues/158#issuecomment-207592883

gdsjaar commented 8 years ago

At a minimum level of capability, the basic netCDF library could be used to show that everything compiles and links correctly.

The NC_MAX_DIMS and NC_MAX_VARS possibly don’t matter, but the parallel builds of netcdf and hdf5 and possibly pnetcdf would be good to have to permit testing of a seacas (and clients of seacas) that use these capabilities.

I also worry that the SEMSDevEnv will be used as a template by people downloading and using the library. Instead of being “minimally acceptable”, it will be used as the “way it should be” — Even the name seems to imply that it is the environment to be used by developers.

However, if the need is for just a “it provides minimal proof of concept”, then the basic NetCDF system-installed serial library is probably ok as long as there is some mention or documentation somewhere that this is not the recommended way to build Trilinos…

..Greg

"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: "Roscoe A. Bartlett" notifications@github.com<mailto:notifications@github.com> Reply-To: trilinos/Trilinos reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, April 8, 2016 at 2:18 PM To: trilinos/Trilinos Trilinos@noreply.github.com<mailto:Trilinos@noreply.github.com> Cc: "Sjaardema, Gregory D" gdsjaar@sandia.gov<mailto:gdsjaar@sandia.gov> Subject: [EXTERNAL] Re: [trilinos/Trilinos] Create a SEMSDevEnv.cmake file to automatically use loaded SEMS dev env (#158)

@gdsjaarhttps://github.com/gdsjaar, for the purpose of running the automated Trilinos test suite, do these parameters really matter? That is our only concern with this effort. We don't expect people to be running large calculations with this build env. We are not going to expect the the Nalu and Drekar test suites are going to be running with this build env.

The plan is to have a more comprehensive set of post-push (Nightly) builds that run on the CEE, ATTB, and other platforms that target particular customer usage of Trilinos. Then, for a carefully selected set of packages for these builds, if they are all clean, then we will merge from the 'develop' branch to the 'master' branch.

Does that make sense?


From: trilinos-framework-bounces@software.sandia.govmailto:trilinos-framework-bounces@software.sandia.gov [mailto:trilinos-framework-bounces@software.sandia.govmailto:trilinos-framework-bounces@software.sandia.gov] On Behalf Of Sjaardema, Gregory D Sent: Friday, April 08, 2016 2:32 PM To: Siefert, Christopher; Hammond, Simon David (-EXP) Cc: Trilinos Framework; sandia-trilinos-developers@software.sandia.govmailto:sandia-trilinos-developers@software.sandia.gov Subject: Re: [Trilinos-Framework] [Sandia-trilinos-developers] GCC, OpenMPI and TPLs for standard Trilinos CI env based on SEMS Dev Env?

Note that the NC_MAX_VAR_DIMS values are not important for Exodus (which is the reason for changing the other values). The only values that must be changed are NC_MAX_DIMS and NC_MAX_VARS and they should be to at least the values shown. So, Nalu and Drekar can and should use the same NetCDF libraries.

The library should, if at all possible, be compiled with —enable-netcdf4 as that provides the superset of capabilities that are needed — It can then support both “very large” and “complex” models. The —enable-pnetcdf option is not really associated with “large mesh” or “small mesh”, but is instead an additional parallel-io method used when using the “auto-join” option of the Ioss library.

Whether to build with parallel enabled is another variable and most pre-installed versions of hdf5 and netcdf will not have this enabled, but for a parallel build of Trilinos, it should be enabled in netcdf and hdf5. This provides the auto-decomposition and auto-join (1->N , N->1) capabilities useable by codes using the IOSS library.

As far as I am aware, all codes should be able to use the same NetCDF and HDF5 libraries as long as we pick the superset (enable-netcdf4, enable-parallel, enable-pnetcdf). I can help resolve any issues where it may seem that multiple installations are needed (other than a serial and parallel trilinos build).

I also have some FindNetCDF.cmake files used in the standalone SEACAS Cmake build that can detect most of these settings that should be adopted for Trilinos.

(NOTE: I have contacted the netcdf developers with a couple ways of eliminating the need to change these values and they have indicated that they will not due to compatibility issues even though they have been very receptive of other changes we have supplied and requested. The ultimate solution may be to have our own version of the NetCDF library supplied in the SEACAS package).

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/trilinos/Trilinos/issues/158#issuecomment-207587288

gdsjaar commented 8 years ago

I would recommend a newer version of NetCDF — 4.4.0 if possible; 4.3.3.1 or later at a minimum. ..Greg

"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: "Roscoe A. Bartlett" notifications@github.com<mailto:notifications@github.com> Reply-To: trilinos/Trilinos reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, April 8, 2016 at 2:40 PM To: trilinos/Trilinos Trilinos@noreply.github.com<mailto:Trilinos@noreply.github.com> Cc: "Sjaardema, Gregory D" gdsjaar@sandia.gov<mailto:gdsjaar@sandia.gov> Subject: [EXTERNAL] Re: [trilinos/Trilinos] Create a SEMSDevEnv.cmake file to automatically use loaded SEMS dev env (#158)

Actually, as part of my pre-push testing, I always have extra builds that test against Drekar and Charon2 since we are usually making changes to Trilinos libraries that these codes depend on.

Makes sense. Does the SEAMS build of NetCDF 4.3.2 allow all of the Drekar and Charon2 tests to pass?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/trilinos/Trilinos/issues/158#issuecomment-207594869

bartlettroscoe commented 8 years ago

From @gdsjaar:

The IOSS library in SEACAS uses the ParMETIS library outside of Zoltan and recommends/requires the use of the 64-bit index type.

From @kddevin:

Zoltan's tests use 32-bit ParMETIS and Scotch. Tests can fail with 64-bit, due to differences in execution path with ParMETIS and Scotch; the partitions returned differ between 32-bit and 64-bit. See #225. If this environment is to be used for testing, we'd prefer 32-bit unless someone else needs 64-bit. If 64-bit is required, we can add 64-bit answers to the repo to allow the tests to pass.

So what is the resolution here? It seems that 64-bit ints can do anyting 32-bit ints can (except use less storage).

How did this incompatibility between SEACAS and Zoltan never get discovered before? I know that SEACAS can use Zoltan (there is a snapshot of Zoltan, tests and all, in the SEACAS github repo). Perhaps no one must have run the Zoltan test suite and the SEACAS test suite at the same time with ParMETIS enabled?

gsjaardema commented 8 years ago

SEACAS (specifically Ioss library) can use either 64-bit or 32-bit index type and only needs the 64-bit index type for very large models -- those exceeding 2.1 billion elements which are not tested in the test suite (yet?). Yes, SEACAS does use Zoltan, but it also uses ParMETIS separately.

gsjaardema commented 8 years ago

Sorry for all the responses, but just for completeness here are the SEACAS responses to the question:

@bartlettroscoe wrote: 3) What TPLs should be enabled? (The SEMS Dev Env assumes blas and lapack are already on the system and provides boost, hdf5, netcdf, parmetis, scotch, qd, superlu, and zlib) 4) Of the TPLs that are enabled, what versions of TPLs should considered standard?

SEACAS uses:

At an absolute minimum with no parallel-IO support, can use just the NetCDF library.

bartlettroscoe commented 8 years ago

@gdsjaar,

I also worry that the SEMSDevEnv will be used as a template by people downloading and using the library. Instead of being “minimally acceptable”, it will be used as the “way it should be” — Even the name seems to imply that it is the environment to be used by developers.

You mean the issue of 32-bit ints vs. 64-bit ints with ParMETIS or the older version of NetCDF? What is the primary concern of SEMSDevEnv.cmake being viewed as “way it should be”?

Just to be clear, the primary goals of this Story are:

  1. Make it easier for Trilinos developers to configure Trilinos when they have loaded the SEMS Dev Env (whatever it happens to be).
  2. Investigate if the SEMS dev env can provide the foundation for a standard pre-push CI dev env for Trilinos, and if not, what changes might allow it to be (which is where must of discussion is this Issue ticket has been focused).

The primary goal was not to try to define the "standard way" or the "right way" to configure Trilinos for all users at Sandia. I don't think SEMS can do that. For the ATTB machines, for example, SEMS is not the way (and hence #172).

The real driver is that we desperately need a standard pre-push CI dev env for testing (not production runs). To that end, it does not necessarily need support large-scale optimized usage (but if it can, great). I would like to make it so that a Trilinos developer just loads the SEMS dev env (using module load commands) and then the Trilinos CMake configure would automatically detect that and automatically find the right compilers, MPI and TPLs (reading from env vars). Or, more safely, we would ask people to run:

$ source $TRILNIOS_DIR/cmake/std/seams/load_dev_env.sh

(which would set a env var like SEMS_STD_DEV_ENV_LOADED) and then the Trilinos configure would automatically pick up the loaded dev env. But before that, we would require that people use -DTrilinos_USE_SEMS_DEV_ENV=ON.

I would like to consider this same model for other important/standard machines like the ATTB machines #172 and even some of the LCF production machines (but that would be a lot of work to maintain).

Also (but not part of this Story Issue ticket), the general challenges of configuring Trilinos with TPLs needs to be addressed. Part of that might be addressed by TriBITSPub/TriBITS#63 but the general problem if TPLs depend on each other is much harder to deal with when static libraries are used (because you have to know every library that needs to be linked, not just the the direct libraries you are using). Static libraries is the problem, for example, in #156. Those more difficult issues will be addressed by another set of Stories (mostly in TriBITS).

These topics are likely something we should discuss in more detail at a future Trilinos Leaders Meeting or at the up-coming Trilinos Spring Developers Meeting.

kddevin commented 8 years ago

There isn't an incompatibility between SEACAS and Zoltan. Both SEACAS and Zoltan work with either 32-bit or 64-bit ParMETIS.
Updating the Zotlan answer files to use 64-bit ParMETIS is probably the resolution. Let me talk with the Zoltan team about it.

If we decide to do the updates:

bartlettroscoe commented 8 years ago

From @gdsjaar:

  • Parallel-NetCDF (PNetCDF) if NetCDF is built with --enable-pnetcdf. Note that this is a different library than NetCDF. Recommend 1.7.0, but 1.6.1 is ok.
  • MatIO for reading/writing MatLab files. From https://github.com/tbeu/matio.git which is the current active development fork. Version 1.5.3 or later built with support for hdf5-based files.
  • CGNS -- version 3.3.0 preferred. Built with ENABLE_SCOPIING and ENABLE_HDF5.

There are TPLs that are not currently provided by the SEMS Dev Env. I am assuming that all of these need to be present in order to fully test changes to SEACAS. But are changes to SEACAS going to be made directly to the Trilinos git repo? If so, then these TPLs should be present before any changes to SEACAS are pushed. If not, then the issue to focus on is what SEACAS TPLs do downstream Trilinos packages like Panzer require in order to test changes to Panzer and downstream packages? Clearly TPLs like NetCDF and HDF5 are needed by downstream Trilinos packages.

UPDATE (4/18/2016): As mentioned by @jgfouca below, SEMS provides pnetcdf as part of the NetCDF TPL. You can see this with:

[rabartl@muir ~]$ module load netcdf/4.3.2/gcc/4.7.2/openmpi/1.8.7
[rabartl@muir ~]$ set | grep NETCDF
...
SEMS_NETCDF_LIBRARY_PATH=/projects/install/rhel6-x86_64/sems/tpl/netcdf/4.3.2/gcc/4.7.2/openmpi/1.8.7/lib
...

[rabartl@muir ~]$ ls $SEMS_NETCDF_LIBRARY_PATH
libnetcdf.a   libnetcdff.la  libnetcdff.so.6      libnetcdf.la  libnetcdf.so.7      libpnetcdf.a
libnetcdff.a  libnetcdff.so  libnetcdff.so.6.0.1  libnetcdf.so  libnetcdf.so.7.2.0  pkgconfig

It is not clear what version is provided, just that it should be compatible with the NetCDF 4.3.2 version installed.

bartlettroscoe commented 8 years ago

From @kddevin:

Updating the Zotlan answer files to use 64-bit ParMETIS is probably the resolution. Let me talk with the Zoltan team about it. What is the timeline needed for these updates? Are they showstoppers? Are they needed by a certain date?

How many Trilinos packages downstream from Zoltan require ParMETIS functionality in their automated test suite? That is the real issue. If no downstream Trilinos packages require ParMETIS functionality, then the ParMETIS and 32-bit vs. 64-bit issue really only directly affects people changing Zoltan (and needing to test before pushing). So the Zoltan team would need to not use the SEMS-provided implementations of Scott and ParMETIS (and would need to use their own 32-bit versions) when testing changes to push for Zoltan until the Zoltan test suite was set up to work with 64-bit integers. Therefore, I think the Zoltan developers can determine this time table on their own.

nmhamster commented 8 years ago

@bartlettroscoe I would really like to see the TPLs made available in SEMS so that we can test codes like NALU against Trilinos. While this may not be needed for CI testing its a huge part of turning the software into something users can make use of for testing/debugging on their desktops.

jgfouca commented 8 years ago

@bartlettroscoe , the netcdf provided by SEMS includes pnetcdf and netcdf-fortran.

gsjaardema commented 8 years ago

@jgfouca - does it also enable netcdf-4 (hdf5-based)?