Closed e10harvey closed 3 years ago
@e10harvey Rerun with environment variable TPETRA_DEBUG
set to 1.
NOTE: We can't really support Trilinos on ATS-2 until we work this out as we have about 270 failing tests which is too much for us to even start to monitor with our current system.
I am sure there is something to do with how we are launching the job with LFS bsub or using mpiexec. It might even be related to an issue that @jjellio reported a while back in #5855 about -disable_gpu_hooks
? (But have not had time to investigate myself.)
@e10harvey One possibility is that someone is passing UVM memory down through Tpetra or through Teuchos::*send
.
On vortex Tpetra will by default detect that MPI is cuda aware and try to use cuda buffers. This doesn't actually work unless you pass -M "gpu"
to jsrun. Alternatively you can set TPETRA_ASSUME_CUDA_AWARE_MPI=0
in the environment to override the default behavior and use host MPI.
Thanks everyone! Passing -gpu, instead of -disable_gpu_hooks, to mpiexec also resolves this.
I'd start by either disabling Cuda aware Tpetra export TPETRA_ASSUME_CUDA_AWARE_MPI=0
in the environment, or adding -M -gpu
to the MPI NP arg variable in Cmake.
The other issue that will cause many unit tests to fail is that unless they call MPI Init, they will segfault. (You need to use #5855 to disable the GPU shared library hooks in IBM Spectrum).
To simply get things going, what you could do: 1) disable Tpetra-cuda aware 2) disable the gpu hooks
(without Cuda + MPI you don't need the GPU hooks, so you can run everything without - I believe). It's been a while since I've mucked with the unit testing stuff, so some of this could have changed.
It's also worth adding, for testing purposes, you really want to test both Cuda-aware MPI and non-cuda-aware MPI. E.g., run with TPETRA_ASSUME_CUDA_AWARE_MPI=0
and TPETRA_ASSUME_CUDA_AWARE_MPI=1 + jsrun -M -gpu
. That exercises different code paths with respect to MPI.
It's also worth adding, for testing purposes, you really want to test both Cuda-aware MPI and non-cuda-aware
@jjellio, what are the actual ATDM APP codes going to be using? What is SPARC and EMPIRE going to be doing on ATS-2? Are they using CUDA-aware MPI or not? That is all we care about from an ATDM perspective. (Other customers may care but they need to be setting up their own Trilinos testing the way that ATDM is doing if their use cases are different from ATDM's.)
@sebrowne, @mhoemmen, what is SPARC doing on ATS-2? Is SPARC using CUDA-aware MPI functionality?
@rppawlo, @bathmatt, what is EMPIRE doing on ATS-2? Is EMPIRE using CUDA-aware MPI functionality?
I believe everyone is using Cuda-aware.
I know various devs have tried with and without because the point is really to find the most performant code path and use it (so this was a free variable. Until recently we got better performance using non-cuda-aware MPI, and that could flip flop, who knows... )
I believe everyone is using Cuda-aware.
@jjellio, how can we verify this? Surely there are @trilinos/tpetra or @trilinos/muelu, or other Trilinos developers with ATDM funding working with SPARC and EMPIRE who would know this off of the top of their heads, no?
I am going to re-open this for now until we get a clear answer to the question of of CUDA-aware MPI or not.
@bartlettroscoe SPARC was using CUDA-aware MPI. In any case, I would like to see GPU builds with this feature both enabled and disabled. That would give complete test coverage.
I would like to see GPU builds with this feature both enabled and disabled. That would give complete test coverage.
@mhoemmen, since this can be changed at runtime, it would be good if the Tpetra test suite itself could add tests for host MPI and CUDA-aware MPI, with the same executables. That would save us a ton of build time for the GPU. Otherwise, I guess we could rig things up to rerun the Trilinos test suite for an existing CUDA build to switch between host MPI and CUDA-aware MPI without having to build anything new.
Are there any @trilinos/tpetra developers who feel strongly enough about this testing on ATS-2 to help us set up this type of testing and get the entire Trilinos test suite cleaned up for both host MPI and CUDA-aware MPI?
@bartlettroscoe
How would a dev add a test like this? Toggling between the two requires an ENV be set, does the unit test setup enable devs to specify specific ENVs for a specific test?
When I coerced ctest to do this, I simply exported the ENV one way, then ran ctest, then changed the ENV and reran it. There are probably more complicated ways to go about this, but the easiest way would be to run ctest twice with the ENV toggled in between. (requires no dev effort, but requires a custom test script for ATS2)
How would a dev add a test like this? Toggling between the two requires an ENV be set, does the unit test setup enable devs to specify specific ENVs for a specific test?
@jjellio, you can set the env on a test-by-test basis in the CMakeLists.txt file. See the the ENVIRONMENT
option to TRIBITS_ADD_TEST()
AND TRIBITS_ADD_ADVANCED_TEST()
. Some simple CMake wrapper functions used in Tpetra would make this easy to do, I think.
But I guess if any Trilinos packages could be calling raw MPI commands, it would just be better to run the entire test suite twice, once with TPETRA_ASSUME_CUDA_AWARE_MPI=1
and one with TPETRA_ASSUME_CUDA_AWARE_MPI=0
or with different arguments passed to mpiexex
or jsrun
? What exactly are you suggesting? Can we make the entire Trilinos test suite pass either way? Issues like #5855 make me worry about that.
But since this is a critical platform, we need to find a way to get the entire Trilinos suite to pass with the use cases that our customers need.
For certain machines, the easiest thing to do would be to enable:
configure
build
export TPETRA_ASSUME_CUDA_AWARE_MPI=0
ctest
Can this be posted to a dashboard? (E.g., build_id+non-cuda-aware)
export TPETRA_ASSUME_CUDA_AWARE_MPI=1
ctest
Can this be posted to a dashboard? (E.g., build_id+cuda-aware)
Finish
The gpu_hooks thing could be worked around using a jsrun wrapper. E.g., checked into Trilinos, something that takes takes the entire JSRUN command, finds the '-p N', if N
is 1, then append a disable gpu_hooks flag
Then forward all of the args to the real jsrun.
@bartlettroscoe We've seen errors downstream in packages like Zoltan2, where developers pass e.g., UVM allocations to MPI without realizing it. What we really want to do is run all of the Tpetra tests (and in fact, all downstream packages' tests, not just Tpetra) with both options. I don't think just having a few tests that turn the option on or off would be quite as useful.
@jjellio, @mhoemmen, I think we can rig up something like what is suggested.
Can someone contribute a jsrun wrapper or mpiexec wrapper that that allows the entire Trilinos test suite to pass on 'vortex' with the new ATDM Trilinos 'ats2' env?
BTW, building on 'vortex' should be as easy as is described in:
Just have your Trilinos clone, create a build directory and then do:
$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh \
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt
$ cmake \
-DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
-DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Tpetra=ON \
$TRILINOS_DIR
$ make -j16
The only difference is that Ninja is not part of the standard ATS-2 stack so we have to use standard makefiles for now. (But we could install it easy enough and add it to the env.)
@jjellio, do you have time to help with this?
@bartlettroscoe - in the link you point to above, there is a section "Specific instructions for each system" that does not have anything for vortex. Can you add the instructions you just posted to the section? I tried the load-env.sh with a target of "cuda-gnu" yesterday and the script crashed immediately. After some hacking I found a list of options that configured and compiled until a link was attempted. It failed to find some basic libraries during linking. For vortex it looks like we need the exact magic string "Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt" you mention above.
@rppawlo, for the Trilinos 'develop' version:
b3449a0931 "Merge Pull Request #6693 from lucbv/Trilinos/Tpetra_matvec_benchmark"
Author: trilinos-autotester <trilinos@sandia.gov>
Date: Thu Jan 30 20:58:10 2020 -0700 (12 hours ago)
on 'vortex' I just tried:
$ hostname
vortex60
$ . cmake/std/atdm/load-env.sh cuda-gnu-opt
Hostname 'vortex60' matches known ATDM host 'vortex' and system 'ats2'
Setting compiler and build options for build-name 'cuda-gnu-opt'
Using ats2 compiler stack CUDA-10.1.243_GNU-7.3.1_SPMPI-2019.06.24 to build RELEASE code with Kokkos node type CUDA
What version of Trilinos are you using? What command are you running and what output are you seeing?
I am running the checkin-test-atdm.sh script right now:
$ ./checkin-test-atdm.sh cuda-gnu-opt cuda-gnu-opt --enable-packages=Tpetra --configure --build
on 'vortex60' and it seems to be building okay. I will make sure I can reproduce the test results shown on CDash locally.
NOTE: we have 4 'ats2' builds running on 'vortex' a shown here.
I used develop from last night. I'm having trouble replicating the script crash this morning. Maybe my environment was messed up? The link issue is due to compiling on the vortex login node as opposed to an interactive shell. I was getting:
ld: cannot find -lz
ld: cannot find -ldl
ld: cannot find -lmpiprofilesupport
ld: cannot find -lrt
ld: cannot find -lpthread
ld: cannot find -ldl
@rppawlo, are you still having trouble building on 'vortex'? If so, we should compare our envs from login and command run step by step and see where the problem lies.
Ross,
I wrote a wrapper that may work. I didn't handle the long version of the '-M' argument, but adding that would be easy.
Supposing we polish the script up, I guess whomever maintains the ENV's for ATS2 could let it live there. Another option would be to talk with the testbed teams to have it added as a module. Letting it live in the Trilinos code base is a better option imo, as it would enable people on non-SNL machines to use it.
Tested w/hostname
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun --np=1 hostname
jsrun -M -disable_gpu_hooks --np=1 hostname
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun --np=2 hostname
jsrun --np=2 hostname
vortex2
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p1 hostname
jsrun -M -disable_gpu_hooks -p1 hostname
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p 1 hostname
jsrun -M -disable_gpu_hooks -p 1 hostname
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p 2 hostname
jsrun -p 2 hostname
vortex2
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p2 hostname
jsrun -p2 hostname
vortex2
vortex2
#!/bin/bash
np_is_one="false"
args=("$@")
for i in "${!args[@]}";
do
arg=${args[${i}]}
# handle the case of seeing the np argument
# lets assume -p ## or --np=##
if [[ "$arg" =~ ^-p[0-9]+$ ]]; then
# np is just the 2nd arg split on 'p'
my_np=$(echo "$arg" | cut -f2 -d'p');
if [ "$my_np" == "1" ]; then
np_is_one="true";
fi
# we are finished
break;
# next, -p ##
elif [ "$arg" == "-p" ]; then
ip1=$(($i+1));
my_np=${args[${ip1}]}
if [ "$my_np" == "1" ]; then
np_is_one="true";
fi
# we are finished
break;
# next, --np=##
elif [[ "$arg" =~ ^--np=[0-9]+$ ]]; then
# np is just the 2nd arg split on '='
my_np=$(echo "$arg" | cut -f2 -d'=');
if [ "$my_np" == "1" ]; then
np_is_one="true";
fi
# we are finished
break;
fi
done
if [ "$np_is_one" == "true" ]; then
added_disable="false"
# look for an '-M' arg and append disable hooks to the next argument
for i in "${!args[@]}";
do
arg=${args[${i}]}
if [ "$arg" == "-M" ]; then
ip1=$(($i+1));
args[${ip1}]="'${args[${ip1}]} -disable_gpu_hooks'"
added_disable="true"
break
fi
done
if [ "$added_disable" == "false" ]; then
# prepend the argument
args=("-M -disable_gpu_hooks" "${args[@]}")
fi
fi
# comment this out after debugging
echo "jsrun ${args[@]}"
eval jsrun ${args[@]}
Edit: I fixed a bug where I wasn't handling the -M flag correctly (needed bash variable protection)
@jjellio, I say to give us control to work out issues, for now, we should put this in the directory:
Trilinos/cmake/std/atdm/ats2/trilinos_jsrun
and then we should update:
Trilins/cmake/std/atdm/ats2/enviornment.sh
to use this script.
I think you want to add:
export ATDM_CONFIG_MPI_EXEC=${ATDM_SCRIPTS_DIR}/ats2/trilinos_jsrun
right above this line that sets ATDM_CONFIG_MPI_POST_FLAGS
:
and then modify that line to do what you want. You can set the env vars
ATDM_CONFIG_MPI_PRE_FLAGS
ATDM_CONFIG_MPI_POST_FLAGS
to customize this however.
Otherwise, @e10harvey, do you have time to work with @jjellio on this and try this out?
Yes.
MPIEXEC will be this script the '-np' flag should be '-p'
There is another issue.
Supposing you want to test both cuda-aware and regular MPI. This entails:
-M -gpu
appended to jsrun.One way to do this would be to modify the wrapper, and if it detects TPETRA_ASSUME_CUDA_AWARE_MPI=1
, then it would append -gpu
to the -M
argument
MPIEXEC will be this script the '-np' flag should be '-p'
@jjellio, okay, we can add a hook for that (which is just adding another ATDM_CONFIG_
env var set on ats2/environment.sh
that we pick up in ATDMDevEnvSettings.cmake
).
@rppawlo, are you still having trouble building on 'vortex'?
@bartlettroscoe - it is working fine now. Thanks!
@e10harvey I just read through the whole issue (I had skipped to the point where Ross mentioned me).
The short is that -disable_gpu_hooks
is something that needs to be applied to single process runs (by default spectrum loads a shared library that hooks all kinds of things, but those hooks depend on MPI_Init being called to initialize things).
The goal of this effort, which probably should be moved to a specific issue, is that Trilinos needs to have atleast two code paths tested:
1) Tpetra w/cuda-aware, which requires -M -gpu
, and
2) Tpetra w/out cuda-aware which requires not having -M -gpu
(I've yet to hear what the flag is to specifically disable GPU aware MPI (there is only an option to enable it).
All of the above are runtime options. When you invoke jsrun
without -M -gpu
, you are getting an entirely different implementation of MPI.
The wrapper I've proposed would enable Trilinos to configure with a fake jsrun that magically handles the -M -gpu
flag based on whether Tptera's cuda-aware ENV is set to 0
or 1
. Also, the wrapper automatically disables the GPU hooks if the process is not MPI-parallel.
I played the game of Vortex unit testing over the summer of 2019. I seem to remember that I was able to get most tests passing, but it required substantial find/replace on the generated ctest files.
To test vortex with both cuda-aware and regular MPI, the testing harness will need to do something like:
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg
to
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_cuda-aware_static_dbg
I'll need to add the logic to set -M -gpu
to the wrapped jsrun, and the testing harness will need to figure out how to do 6-10.
It's worth noting, this same testing workflow could be implemented on White/Ride/Waterman. On those machines the MPI impl is effectively running with the openMPI variant of -M -gpu
. So on those machines mpirun
would need to be wrapped to disable cuda-aware MPI.
Otherwise, @e10harvey, do you have time to work with @jjellio on this and try this out?
Certainly. @jjellio: I will try this out using what everyone has shared in this issue and follow-up once that's done. At first I was thinking that this spectrum MPI version should be updated s.t. these tests pass (and maybe it still needs to be for other reasons) but bottom line is that we need to test both the cuda-aware MPI and non-cuda-aware MPI paths.
I am adding the logic so the wrapper will detect the TPETRA
variable and set -gpu
now.
You can still use the wrapper above, it only handles the single process gpu_hooks
business right now.
I am adding the logic so the wrapper will detect the
TPETRA
variable and set-gpu
now.
@jjellio: Would you please open a PR with the updated wrapper and flag me in it? If you just put the wrapper in Trilinos/cmake/std/atdm/ats2/trilinos_jsrun
I'll try to take care of the rest.
but bottom line is that we need to test both the cuda-aware MPI and non-cuda-aware MPI paths.
@e10harvey, I have ideas about how we can set up some Jenkins jobs that use the same build to run the test suite both ways. Once we can manually test that this works, then we can get automated builds in place that run these tests cases efficiently and post to CDash.
Thanks, all!
@e10harvey, what is the evidence this is fixed?
@e10harvey, what is the evidence this is fixed?
I see no evidence of the Seg Fault on CDASH over the past 2 weeks.
Question
@trilinos/tpetra
We are trying to stabilize a new environment introduced in #6547 but are seeing many seg faults in multi-process Tpetra tests. The set of failing tests experiencing seg faults are not failing intermittently. What's causing the seg faults in BlankRowBugTest_MPI_2 on vortex? Is this an issue with how we're running the tests? See CDASH.
To reproduce, see: CDOFA-94.
As shown in CDASH and CDOFA-94, there are two unmapped addresses:
0x7fff34937e80
, which is read by rank 1, while0x7fff14937a80
is read by rank 0.m_impl_handle
) atKokkos::Impl::ViewDimension5
,m_map = {m_impl_handle = 0x7fff34938080, <snip>
?CC: @bartlettroscoe.