CC: @trilinos/seacas, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52
??: Add label "client: ATDM">
??: Add label "ATDM Sev: Blocker" (by default but could be other "ATDM Sev: XXX")>
??: Add label "type: bug"?>
??: Add label for affected packages (e.g. "pkg: MueLu", "pkg: Tpetra", "pkg: Kokkos", etc.)>
??: Add label "PA: ???Project Area???" (e.g. "PA: Linear Solvers", "PA: Data Services")>
??: Add milestone "Initial cleanup of new ATDM ..." or "Keep promoted ATDM ...">
??: Once GitHub Issue is created, add entries for tests to TrilinosATDMStatus/*.csv files>
## Next Action Status
## Description
As shown in [this query](https://testing.sandia.gov/cdash/index.php?project=Trilinos&begin=2019-09-01&end=2019-09-30&filtercount=1&showfilters=1&field1=buildname&compare1=61&value1=Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug) the SEACAS executable
* `seacas/applications/explore/explore`
started failing to build on testing day 2019-09-18 in the 'waterman' build:
* Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug
showing the build error (for example [here](https://testing.sandia.gov/cdash/viewBuildError.php?buildid=5650870)):
```
packages/seacas/libraries/suplib/libsuplib.a(convert.C.o): In function `__sti____cudaRegisterAll()':
tmpxft_00000d2a_00000000-5_convert.cudafe1.stub.c:11: undefined reference to `__cudaRegisterLinkedBinary_42_tmpxft_00000d2a_00000000_6_convert_cpp1_ii_convert_'
collect2: error: ld returned 1 exit status
```
The new commits that were pulled the day that these failures started are show, for example, [here](https://testing.sandia.gov/cdash/viewNotes.php?buildid=5650870#!#note6).
From looking over that set of commits, it seems likely the merged PR #5920.
## Current Status on CDash
The status of the SEACAS build on this system can be seen on CDash in:
* [`SEACAS` package status in `Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug` build over last 10 days](https://testing.sandia.gov/cdash/index.php?project=Trilinos&begin=10%20days%20ago&end=now&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug&field2=subprojects&compare2=93&value2=SEACAS)
## Steps to Reproduce
One should be able to reproduce this failure on the machine `waterman` as described in:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md
More specifically, the commands given for the system are provided at:
* https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#waterman
The exact commands to reproduce this issue should be:
```
$ cd /
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh \
Trilinos-atdm-waterman-cuda-9.2-rdc-release-debug
$ cmake \
-GNinja \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_SEACAS=ON \
$TRILINOS_DIR
$ ninja -j 20
$
NOTE: SEACAS is not having build problems on any other CUDA build or even the CUDA+RDC build on 'ride' as shown in this query which includes the builds:
Also note that this executable seacas/applications/explore/explore failing to build does not seem to trigger any SEACAS test failures. This suggests that this executable is not being tested.
Do the ATDM APPs (or any SNL customer) use this seacas/applications/explore/explore exectuable? If so, is it a risk that it could be broken and one would not know it because it is not tested?
Yes, the explore application is used by SNL and other customers. It is probably a risk that is it not tested, but typically if it builds it works; not the best option, but it has been sufficient for a couple decades.
I'm not sure why explore is not building on waterman. The change was to add the calling of a C++ routine from the fortran code and it is the C++ routine (convert.C) which seems to be somehow triggering a call to a cuda routine. There are a couple references to cuda or nvcc in the fmt/format.h include file in convert.C, but they do not call any cuda routines and are just turning on or off some template code that is not supported on certain compilers.
My guess is that this is being compiled with NVCC, but since it is linked into a fortran code there is a missing library that is normally added for C and C++ links. Is there a way to disable the nvcc compilation since this will never be used on the GPU? Something to add to one or more CMakeLists.txt files...?
It seems likely that the ATDM APPs are not using this executable, at least not on 'waterman' (or we would have heard about it).
@gsjaardema, can we disable the build of this executable for now in our testing on just this one RDC build? Now that we are expecting to see a build error in this configuration, I fear that it will obscure the emergence of a new build error for this configuration.
and there is mention of an explore_diff.py which looks like it depends on a program called explore in the shell path. This looks to be used in SPARC verification test suite.
Therefore, SPARC might depend on this SEACAS 'explore' executable. But SPARC is not yet (if ever) using CUDA+RDC so this failing build does not impact the ATDM customers.
Is it okay if I disable the build of this executable in just this CUDA+RDC build?
FYI: This executable has been disabled for a long time and there does not seem to be any problems reported by any ATDM customers (likely because they are not using cuda+rdc builds). Therefore, I will add the "Stalled" label to get this off of our main list of issues.
CC: @trilinos/seacas, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52
@gsjaardema,
NOTE: SEACAS is not having build problems on any other CUDA build or even the CUDA+RDC build on 'ride' as shown in this query which includes the builds:
Also note that this executable
seacas/applications/explore/explore
failing to build does not seem to trigger any SEACAS test failures. This suggests that this executable is not being tested.Do the ATDM APPs (or any SNL customer) use this
seacas/applications/explore/explore
exectuable? If so, is it a risk that it could be broken and one would not know it because it is not tested?Yes, the explore application is used by SNL and other customers. It is probably a risk that is it not tested, but typically if it builds it works; not the best option, but it has been sufficient for a couple decades.
I'm not sure why explore is not building on waterman. The change was to add the calling of a C++ routine from the fortran code and it is the C++ routine (
convert.C
) which seems to be somehow triggering a call to a cuda routine. There are a couple references tocuda
ornvcc
in thefmt/format.h
include file inconvert.C
, but they do not call any cuda routines and are just turning on or off some template code that is not supported on certain compilers.My guess is that this is being compiled with NVCC, but since it is linked into a fortran code there is a missing library that is normally added for C and C++ links. Is there a way to disable the nvcc compilation since this will never be used on the GPU? Something to add to one or more CMakeLists.txt files...?
Somewhat similar issue reported at https://stackoverflow.com/questions/22115197/dynamic-parallelism-undefined-reference-to-cudaregisterlinkedbinary-linking
@gsjaardema asked:
Don't know.
@trilinos/kokkos-kernels, @trilinos/tpetra
Is there a way to tell nvcc_wrapper to not build certain files with nvcc but only use the host compiler? Looking at:
can this be done with adding
--host-only
?It seems likely that the ATDM APPs are not using this executable, at least not on 'waterman' (or we would have heard about it).
@gsjaardema, can we disable the build of this executable for now in our testing on just this one RDC build? Now that we are expecting to see a build error in this configuration, I fear that it will obscure the emergence of a new build error for this configuration.
@gsjaardema,
I looked in all of the EMPIRE sources with:
and I could not find any usage of this SEACAS 'explore' exectuable in the production or test code.
I searched all of the SPARC sources with:
and there is mention of an
explore_diff.py
which looks like it depends on a program calledexplore
in the shell path. This looks to be used in SPARC verification test suite.Therefore, SPARC might depend on this SEACAS 'explore' executable. But SPARC is not yet (if ever) using CUDA+RDC so this failing build does not impact the ATDM customers.
Is it okay if I disable the build of this executable in just this CUDA+RDC build?
@bartlettroscoe Yes, disabling this in the CUDA+RDC build would be good.
This is disabled in PR #6121 and I manually merged to 'atdm-nightly' in the commit 1fe27b5.
Putting this in review until we get confirmation from CDash tomorrow.
FYI: The SEACAS PR https://github.com/gsjaardema/seacas/pull/154 was merged. Now we are just waiting on the merge of PR #6121 (being held up due to broken Trilinos PR tester).
FYI: This executable has been disabled for a long time and there does not seem to be any problems reported by any ATDM customers (likely because they are not using cuda+rdc builds). Therefore, I will add the "Stalled" label to get this off of our main list of issues.