trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 559 forks source link

Create automated ATDM build of Trilinos on shiller submitting to Trilinos CDash site #1247

Closed bartlettroscoe closed 6 years ago

bartlettroscoe commented 7 years ago

Next Action Status:

Reproduced the configure of Trilinos+Drekar on 'shiller'. Next: Create refactored files and scripts and put into Trilinos git repo ...

Description:

This story is assoicated with #1246 and that is to drive the creation of these common ATDM configuration files and scripts while getting an automated build of Trilinos for ATDM set up on shiller with CUDA/GPUs. This will be followed by other stories for getting common configuration files and automated builds set up for other ATDM platforms.

These files will be copied and refactored from those that exist for Trilinos+Drekar on shiller. The existing configure scripts in for Trilinos+Drekar will be refactored to use these new scripts so there is no duplication.

Definition of Done:

CC: @trilinos/framework, @bathmatt, @rppawlo, @kruger

bartlettroscoe commented 7 years ago

@bathmatt and @rppawlo, the existing Trilinos+Drekar configure scripts in DrekarBase/drekar/build_scripts/ take the env var JOB_NAME as input (e.g. JOB_NAME=Drekar_opt_gnu_openmp). What Trilinos build should I start with? The most useful build to protect developers working is usually one with optimized compiler flags (i.e. -DCMAKE_BUILD_TYPE=RELEASE) and debug-mode runtime checking (i.e. -DTrilinos_ENABLE_DEBUG=ON). If that build runs, then the non-debug-mode runtime build (i.e. -DTrilinos_ENABLE_DEBUG=OFF) should work as well.

Therefore, unless you tell me different, I will first work to get and optimized compiler flags with debug-mode runtime checking build sets up to submit to CDash as the first build and the build that will drive my refactoring in #1246 and this story.

bartlettroscoe commented 7 years ago

I just started working on this. I was able to follow the documented process and get Trilinos+Drekar to configure and build and pass all of the tests with the JOB_NAME=Drekar_opt_gnu_openmp configuration and it produced:

$ time ./configure-drekar.sh &> configure-drekar.out

real    2m55.217s
user    1m30.581s
sys     0m11.807s

$ time make -j10 &> make.out

real    45m59.959s
user    287m46.318s
sys     76m29.644s

$ time ctest &> ctest.out

real    17m29.424s
user    496m45.812s
sys     3m5.475s

with all passing tests:

100% tests passed, 0 tests failed out of 249

Label Time Summary:
Drekar    = 549.98 sec (145 tests)
Panzer    = 498.38 sec (104 tests)

Total Test time (real) = 1049.40 sec

(will provide details later).

Now I have a solid foundation to extract the common config files and scripts and refactor the Drekar config scripts to use these (so current Drekar and EMPIRE developers and users will not notice any differences).

bartlettroscoe commented 7 years ago

Finally getting back to this ...

To emphasize the importance of using a single set of configure files and scripts for these platforms, since I did the tribal build of Trilinos+Drekar on shiller described above on April 18, there has aleady been three changes to the drekar/build_scripts/shiller/environment.sh script. That means that if Trilinos had just taken a static copy of these scripts and then moved on, the automated Trilinos build on shiller would have already been out of date. In fact, they would have diverged on May 2 with the commit:

704d8ba "Adding new environment variable that was missing from the advanced platform environments."
Author: Richard Kramer <rmkrame@sandia.gov>
Date:   Tue May 2 10:50:37 2017 -0600 (3 days ago)

M       drekar/build_scripts/ellis/environment.sh
M       drekar/build_scripts/ride/environment.sh
M       drekar/build_scripts/shiller/environment.sh

That just reinforces the need to get a single set of configuration files and scripts for Trilinos for ATDM and automated builds.

bartlettroscoe commented 7 years ago

Removing from SART board since we can just discuss this as part of #1246

jhux2 commented 6 years ago

@bartlettroscoe There's been recent interest in getting automated dashboard builds that reflect the current configuration/builds of Trilinos done by the ATDM applications. Is this the correct issue to ask about the status of such efforts?

bartlettroscoe commented 6 years ago

There's been recent interest in getting automated dashboard builds that reflect the current configuration/builds of Trilinos done by the ATDM applications. Is this the correct issue to ask about the status of such efforts?

Yes and no. Generic discussions about this can go here but more sensitive SNL-specific discussions need to go in:

The current status is that we are pushing to get a CMake/CTest/CDash upgrade (see https://gitlab.kitware.com/snl/project-1/issues/33) so that we can use the new all-at-once configure, build, and submit mode (see TriBITSPub/TriBITS#183) so that we can afford to run Trilinos CUDA builds the ATDM platforms and still submit to CDash and have targeted emails go out to specific Trilinos package development teams so they can see the failures and then address them. We will also need to complete #1293 so that we know where to send these builds to CDash and so that there are clear policies and processes to keep these builds clean. Otherwise, someone will need to manually watch CDash every day and create new GitHub issues when there is a failure.

The other problem is that there is no such thing as an "ATDM" build of Trilinos. SPARC has its way of building Trilinos and Drekar/EMPIRE has its way (and the configuration for Drekar and EMPIRE may actually diverge as well with the EMPIRE reboot). Therefore, we need to try to get SPARC and EMPIRE using a single consistent "ATDM" configuration of Trilinos (but SPARC will only enable and/or use a subset of the packages that EMPIRE does). That will be needed to make this viable and support SPARC and EMPIRE (or we will need to pick one or the other to support).

But there is a pretty straightforward plan for getting these build configurations under proper version control in Trilinos so that they can be used for automated builds posting to the Trilinos CDash site and the same exact version-controlled configurations can be used by the ATDM APP teams. See:

bartlettroscoe commented 6 years ago

This GitHub issue is old and stale. We are going to track this primarily in:

so we can be a little more free to discuss ATDM APP codes and env.

Closing.