Closed bartlettroscoe closed 6 years ago
I sent the following email. I will give it until the week of 3/12/2-18 after the SIAM PP conference to have this meeting. That should be urgent enough.
To: sandia-trilinos-developers@software.sandia.gov Cc: trilinos-framework@software.sandia.gov Subject: [Trilinos-Framework] Interested in selection of builds used in mandatory auto PR testing and merge process?
Hello Sandia Trilinos Developers,
Following up on the action item I was given at the Trilinos Developers Planning meeting on Monday with notes listed at:
https://docs.google.com/document/d/1JClLSR3n79XJT_yLPPoQv_e3rJToHxl8fUzRYojtY5Y
I am going to organize a discussion on the selection of the set of builds the auto PR testing process should be using before it becomes the required way to test and push to the Trilinos ‘develop’ branch.
Therefore, if you are interested in being part of this discussion, please add your name in the first comment in the existing list under the “Description” section in the new issue:
https://github.com/trilinos/Trilinos/issues/2317
and then I will try to organize a meeting of all the interested parties.
Since this selection of build will impact every Trilinos developer and every close customer and collaboration of Trilinos, it is important that we get input from many different people in making this selection.
Thanks,
-Ross
Any machine used for PR testing must be accessible (either directly or by an identical system) by all developers so they can fix issues that PR testing exposes.
I'm with @csiefer2 -- for example, if we intend to support Windows builds, the Windows Dashboard build should be a VM that we can access (either log into, or download and run on our own) and use for testing.
I don't think there is no way that we would make a Windows build a auto PR build or even a CI build anytime soon. There are so many other builds that more important than that.
But I think the rule should be that before any build can enter the "Nightly", "Clean", or "Auto PR" category, that it must be easy for any Sandia staff member to access a machine where the build can be exactly reproduced. I agree with @csiefer2 and @mhoemmen on this point.
FYI: @trilinos/muelu
It appears that the current auto PR testing GCC 4.8.4 posting to CDash never runs any MueLu tests as shown at:
Not sure why that is because the standard CI build set up for last 1.5 has been running MueLu tests without any issues (except for real failures) as shown in recent history at:
FYI: @trilinos/anasazi
It also appears that the auto PR builds are not running any Anazazi tests either as shown by:
FYI: @trilinos/ifpack2
It also appears that the auto PR builds are not running any Ifpack2 tests as shown by:
@bartlettroscoe Thank you for the reminder about the MueLu, Ifpack2, and Anasazi tests. These packages had test failures a while back and were disabled for that reason (needed 100% passing). Some of these issues I believe have been resolved. I am testing that now.
Thank you for the reminder about the MueLu, Ifpack2, and Anasazi tests. These packages had test failures a while back and were disabled for that reason (needed 100% passing). Some of these issues I believe have been resolved. I am testing that now.
@jwillenbring,
Just the failing tests should be disabled, not all the tests for a package. You write a GitHub issue for the failing test(s), then disable just those tests. Generally one wants to disable with a scapulae, not a machete.
Just the failing tests should be disabled, not all the tests for a package. You write a GitHub issue for the failing test(s), then disable just those tests. Generally one wants to disable with a scapulae, not a machete.
Agreed. At the time I didn't want the issues to be lost and not notice that the tests were disabled. Much of this has to do with ETI.
At the time I didn't want the issues to be lost and not notice that the tests were disabled. Much of this has to do with ETI.
If the developers of those packages don't care to fix the broken tests for GitHub issues associated with their packages then that is on them. And you only disable the tests just for that particular build, not for all builds. Look at the git log file for cmake/std/BasicCiTestingSettings.cmake
to see how to do this. These tests will still show up failing in "Clean", "Nightly" and other builds but at least they will not mess up other developers PR testing. That way, no one will forget these failed tests. But if they let the tests fail for more than a day or two, then these tests need to be disabled in "Clean" and "Nightly" as well.
It is important to fix this quickly because developers might be thinking that the auto PR builds are testing Ifpack2, Anasasi, and MueLu but they are not. This means that the auto PR builds have even a higher chance of breaking tests in these packages.
Wait, what?!? Ifpack2 and MueLu tests were totally disabled?!?
Hello, it is after SIAM PP. Can we set up a meeting on this?
@jwillenbring and/or @bmpersc,
Do you guys want to be included in this meeting? If so, please add your GitHub IDs to the list in the above Description field. I will then try to set up a meeting using people's SNL calendars to select a time that works for everyone. (Or if I can't find a time, I will set up a dreaded doodle.com poll.)
One issue to consider is that according to:
GCC 4.8.4 should implement OpenMP 3.1 while GCC 4.9.3 should implement OpenMP 4.0.
Is that an issue for testing OpenMP code with Trilinos? How much difference is there in the Kokkos implementation of threading of OpenMP 3.1 vs OpenMP 4.0? Or is this not a concern because updates to Kokkos do testing with many different compilers before merging to Trilinos 'develop'?
If we are not worried about the OpenMP 3.1 vs. OpenMP 4.0 issue, then a single GCC 4.8.4 build for auto PR testing would seem to be okay.
@bartlettroscoe - OpenMP3.1 would be good for now because we have a range of platforms where this is the most up to date complete supported standard.
@trilinos/shylu developers,
Why are there no tests being run for any of the ShyLU packages in the existing auto PR and CI builds:
?
These packages are listed a Primary Tested (PT) packages but they don't have any tests?
@bartlettroscoe each package in each of ShyLU's subpackages is marked either ST or EX:
I'm not sure why they are being listed as PT packages, the use of
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
SUBPACKAGES_DIRS_CLASSIFICATIONS_OPTREQS
...
in the shylu_node
and shylu_dd
subpackages seems to fit the example from tribits dev guide example. Does anything stand out that clearly needs to be changed?
Opening a new issue after speaking with @srajama1 to get packages migrated to PT testing, #2375
Edit: Added reference to ShyLU issue.
NOTE: Defects like #2391 show why we need to be enabling OpenMP in our auto PR build.
CC: @trilinos/zoltan2
I was just reminded today by #2397 that the auto PR builds based on SEMS should also enable the Scotch TPL. That CI build (and the builds derived from it) are the only automated builds of Trilinos that enable Scotch and therefore are the only builds that run these tests that depend on Scotch. See #2065.
Got feedback from @bathmatt and @micahahoward on the builds they would like to see to help protect ATDM APP SPARC and EMPIRE usage of Trilinos. The below email thread provides a converssation. In short, they would like to see following builds tested in the auto PR tester (in ATDM Trilinos build keywords terms):
@bathmatt and @micahahoward,
Would it be just a good if gnu and intel switched and we added debug-mode checking to the CMAKE_BUILD_TYPE=RELEASE to catch more undefined memory usage problems with:
?
That would allow people to run with debug-mode checking and OpenMP on machines that did not have an Intel compiler, or there was an Intel license sever problem (which happens from time to time). Is there something better about testing OpenMP with Intel than with GCC?
Also, is testing with a serial node (i.e. no Kokkos threading) really an important use case to test pre-merge to 'develop'? Is this a debugging tool in case something goes wrong with the OpenMP Kokkos node, even with OMP_NUM_THREADS=1
? If OpenMP works, why would you not use that build for development, debugging, or running problems?
@bartlettroscoe - running with OpenMP enabled but intended to do MPI-only (i.e. OMP_NUM_THREADS=1
) has been show to add fairly serious performance overhead because of reduced compiler optimization and lower efficiency (the OpenMP runtime is outlining functions which are not needed in the MPI only runs). For systems like ATS1/Haswell partition, we would strongly recommend MPI only (i.e. Serial
) as the fastest configuration.
@bartlettroscoe - in general if we use a sufficiently new GCC, the difference in supported OpenMP features is fairly low. Using older GCC means that Intel would be a better choice.
@nmhamster,
For systems like ATS1/Haswell partition, we would strongly recommend MPI only (i.e. Serial) as the fastest configuration.
Okay, so testing with a serial non-threaded Kokkos node is important to test. Is the compiler in that case always going to be some variant of GNU?
in general if we use a sufficiently new GCC, the difference in supported OpenMP features is fairly low.
I think GCC used for PR testing needs to be GCC 4.8.4, because that is the minimum required version of GCC for Kokkos. Is that considered "old" from an OpenMP perspective compared to say Intel 15.0..2 (which I think is the minimum version of Intel that Kokkos currently supports).
So which is better:
or
from this ATDM perspective?
@bartlettroscoe
Okay, so testing with a serial non-threaded Kokkos node is important to test. Is the compiler in that case always going to be some variant of GNU?
More likely to be Intel on ATS1 or CTS platforms. Rarely use GNU in production environments.
I think GCC used for PR testing needs to be GCC 4.8.4, because that is the minimum required version of GCC for Kokkos. Is that considered "old" from an OpenMP perspective compared to say Intel 15.0..2 (which I think is the minimum version of Intel that Kokkos currently supports).
Both are quite old but they have a fairly consistent set of OpenMP features. The more useful case is Intel 17.4 (with GCC 4.9.3 header files) because this is the current configuration used for production builds. Obviously, over time this will change but updates are infrequent.
@bartlettroscoe - the Intel comment relates to ATS1, for ATS2 then GCC is more likely. So its quite complicated as we move forward and have to support many platforms. Sorry to be a pain.
@nmhamster,
More likely to be Intel on ATS1 or CTS platforms. Rarely use GNU in production environments.
So that means that intel-opt-serial
is a more useful build to help support ATDM than gnu-opt-serial
, right?
Remember, this is just the auto PR builds, not all of the builds. We are setting up automated ATDM-focused builds of Trilinos on every system and with every configuration that is important to ATDM. It is up to us in ATDM to decide what those builds should be.
@bartlettroscoe
So that means that intel-opt-serial is a more useful build to help support ATDM than gnu-opt-serial, right?
Yes, I think this would be more useful.
Remember, this is just the auto PR builds, not all of the builds. We are setting up automated ATDM-focused builds of Trilinos on every system and with every configuration that is important to ATDM. It is up to us in ATDM to decide what those builds should be.
Understood.
Not sure I can offer much that hasn't already been said. Let me know if you need anything from me.
@bartlettroscoe The only issue with CUDA debug testing is that it's really REALLY slow (e.g., every Kokkos::View
array access gets checked on the GPU), so you may need to increase the time-out.
@bartlettroscoe The only issue with CUDA debug testing is that it's really REALLY slow (e.g., every Kokkos::View array access gets checked on the GPU), so you may need to increase the time-out.
@mhoemmen, or just perhaps only run the faster BASIC
test suite? But I think the CUDA build should be cuda-opt
, not cuda-debug
. ATDM is running full cuda-debug
and there are just a few issues with Panzer tests timing out (see #2318).
@bartlettroscoe That's cool -- just note that the slowness comes from bounds checking and other Kokkos debug features, not from -g -O0
. Thus, "release build with bounds checking on" will be nearly as slow as "debug build with bounds checking on."
We had a meeting last Thursday to discuss the selection of the auto PR builds which involved the people @bartlettroscoe, @mhoemmen, @ibaned, @bmpersc, @crtrott, @jwillenbring, @rppawlo, @agsalin, and @william76. The builds we selected are shown in the Google Doc page:
In sujmary, we selected the following builds:
SEMS GCC 4.8.4, OpenMPI 1.8.7, Primary Tested Packages and TPLs, CMAKE_BUILD_TYPE=RELEASE
, Trilinos_ENABLE_DEBUG=ON
, OpenMP
(OMP_NUM_THREADS=2
)
SEMS Intel 17.x, OpenMPI 2.x, Primary Tested Packages and TPLs, CMAKE_BUILD_TYPE=RELEASE
, Trilinos_ENABLE_DEBUG=OFF
,
CUDA (will match the ATDM CUDA build on 'white', see ATDM Builds of Trilinos )
All of the builds have the common options:
BUILD_SHARED_LIBS=ON
Trilinos_ENABLE_EXPLICIT_TEMPLATE_INSTANTIATION=ON
Xpetra_ENABLE_Experimental=ON
MueLu_ENABLE_Experimental=ON
Trilinos_TRACE_ADD_TEST=ON
Trilinos_TEST_CATEGORIES=BASIC
I will create some new GitHub issues for each of these target builds to outline what needs to be done to get them ready to plug into the auto PR tester.
I am uncomfortable testing with these options enabled:
Xpetra_ENABLE_Experimental=ON MueLu_ENABLE_Experimental=ON
We're still waiting on an explanation of why the apps want those...
I am uncomfortable testing with these options enabled:
Xpetra_ENABLE_Experimental=ON MueLu_ENABLE_Experimental=ON
We're still waiting on an explanation of why the apps want those...
That is a negotiation that needs to take place between MueLu developers, ATDM APP developers and management. But as long as these are getting enabled in ATDM builds of Trilinos, we need to enable them in these builds as well to protect ATDM customers.
Well consider my objection logged then :)
@csiefer2 If it helps, I objected strenuously (alas, I don't think it will help, this sounds like one of those "SURPRISE WE'RE USING THIS NOW YOU'D BETTER SUPPORT IT" things).
@mhoemmen Awww, thanks!
@csiefer2 @mhoemmen To make you feel more comfortable: it helps to grep through the Xpetra and MueLu source code for the Experimental flag.
Long in short: i want to encourage you to make the Experimental flag an Experimental flag, again ;-)
Happy easter holidays and greetings from Europe!
@tawiesn : Matlab is not experimental. That's TPL guaded.
@csiefer2 That's right. It's marked to be experimental only in the ReleaseNotes.txt file. Even better!
@tawiesn Thanks and happy Easter to you too! I'm not sure then why ATDM needs that flag. I would feel better if the ATDM build did not enable it, or if we could identify exactly what they need and make just that not experimental.
@mhoemmen
if we could identify exactly what they need and make just that not experimental.
I think that would be the right thing to do. And i am very sure it's not much work. But you have definitely more time/resources and the ability to run the tests than i have with my little tiny 8 year old computer.
If i have to make a guess: it's probably something in the MueLu Stratimikos interface which we use for years without problems and therefore can safely mark as non-experimental.
@mhoemmen @csiefer2
documentationisalwaysuptodate
Clean code principle: Don’t document your code. Code your documentation.
Believe it or not: that really works.
FYI: @micahahoward last week told me that he things that using OpenMP with the Intel 17.x build would be better than with the GCC 4.8.4 build from an ATDM perspective.
Does anyone have an objection to making that change to the plan for auto PR builds?
The EMPIRE build of Trilinos has removed the experimental Xpetra and MueLu enables and these have been removed from the ATDM and the CI build (see commits 7481c76 and 71c51f2). I updated the document:
to remove these options.
@prwolfe @micahahoward Hooray! :D
@crtrott and @nmhamster,
Since we can't turn on OpenMP in our auto PR builds yet due to the problem of pinning to the same cores as described in #2422, would a short-term alternative be to try to use Pthreads for threading with Kokkos? That is, use Pthread for the Kokkos host threading model?
Also, why does Kokkos not automatically use Pthread for threading when the TPL Pthread is enabled and no other threading model is specified (and OpenMP, CUDA, etc.) are not enabled or specified? You ca see that simply enabling the Pthread TPL does not turn on Kokkos usage of Pthread as shown in the recent auto PR build:
which shows:
Explicitly enabled TPLs on input (by user): Pthread MPI BLAS LAPACK Boost ParMETIS Zlib HDF5 Netcdf SuperLU BoostLib DLlib 12
...
-- Setting KokkosCore_ENABLE_Pthread=ON since TPL_ENABLE_Pthread=ON
...
-- ****************** Kokkos Settings ******************
-- Execution Spaces
-- Device Parallel: None
-- Host Parallel: None
-- Host Serial: Serial
--
-- Architectures:
-- None
--
-- Enabled options
-- KOKKOS_ENABLE_SERIAL
-- KOKKOS_ENABLE_DEBUG
-- KOKKOS_ENABLE_PROFILING
-- KOKKOS_ENABLE_DEPRECATED_CODE
...
Therefore, enabling the Pthread TPL in this build does nothing, right?
@bartlettroscoe wrote:
would a short-term alternative be to try to use Pthreads for threading with Kokkos?
Kokkos folks have been telling everyone not to use the Pthreads back-end for quite a while.
Also, why does Kokkos not automatically use Pthread for threading when the TPL Pthread is enabled and no other threading model is specified (and OpenMP, CUDA, etc.) are not enabled or specified?
The Pthread TPL gets automatically detected and enabled by default. (There are reasons to enable it that have nothing to do with thread parallelism.) Users found it unpleasantly surprising to get thread parallelism when they didn't ask for it explicitly. Thus, Kokkos and Tpetra historically did not enable the Pthreads back-end unless explicitly requested. Tpetra treats Kokkos::Threads
as a last resort in terms of defaults.
More importantly, OpenMP is the only back-end that promises polite sharing of hardware with other libraries that use OpenMP. The Pthreads back-end does not share hardware resources nicely with other libraries that use threads (even Pthreads).
@bartlettroscoe There is a bit of a fun history. "Kokkos 1.0" (Chris Baker's thread-parallel programming model) had the following back-ends originally:
The "OpenMPNode" came later and was actually an external contribution. Historically TBBNode didn't get exercised so much, because it required a TPL. CUDA was very poorly tested. This made TPINode a common thing for people to try. None of the Nodes performed particularly well in Tpetra until I specialized some of the Serial kernels. We eventually replaced Kokkos 1.0 with Kokkos 2.0 in Tpetra; the fact that Nodes still exist is mainly because there is no schedule for Trilinos 13.0 (we've given up & have plans to get rid of them by making them unnecessary).
CC: @trilinos/framework
Next Action Status
This ship has sailed on "Initial" a long time ago. The only remaining build in the CUDA PR build and that is being tracked in #2464.
Description
This story is to involve interested members of the Trilinos team to collaborate to select the best build configurations in order to allow the new Trilinos auto PR testing and merging tool and process (#1155) to become the mandatory way to test and push to the main Trilinos ‘develop’ branch (#2312). This selection must be done considering that there are limited computing resources (on the Jenkins build farms) to run automated builds. So while we would like to run many different useful builds as part of pre-push automated PR testing, we have to be strategic about what builds we run where to not overwhelm current capacity. As more build machines are added to the Jenkins build farm, more builds can be added to the auto PR testing process.
This story is a follow on from the action item:
in the 2018-02-26 Trilinos Planning Meeting.
The set of Trilinos team members interested in being part of this discussion (and meetings) include:
Since this selection of these builds will impact every Trilinos developer and every close customer and collaboration of Trilinos, it is important that we get input from many different people in making this selection.
Definition of Done
Related Issues
Task
.cmake
files for each proposed configuration and set up nightly builds sumitting to "Specialized" CDash Track/Group: a. GCC 4.8.4: See #2462 b. Intel 17.x: See #2463 c. CUDA: See #2464