Tpetra: BlankRowBugTest_MPI_2 Seg faulting on vortex

e10harvey commented 4 years ago

Question

@trilinos/tpetra

We are trying to stabilize a new environment introduced in #6547 but are seeing many seg faults in multi-process Tpetra tests. The set of failing tests experiencing seg faults are not failing intermittently. What's causing the seg faults in BlankRowBugTest_MPI_2 on vortex? Is this an issue with how we're running the tests? See CDASH.

To reproduce, see: CDOFA-94.

As shown in CDASH and CDOFA-94, there are two unmapped addresses: 0x7fff34937e80, which is read by rank 1, while 0x7fff14937a80 is read by rank 0.

Does the base of these addresses first come in through rank 1 (via m_impl_handle) at Kokkos::Impl::ViewDimension5, m_map = {m_impl_handle = 0x7fff34938080, <snip>?
Is this related to https://github.com/trilinos/Trilinos/blob/develop/packages/tpetra/core/src/Tpetra_Map_def.hpp#L2215?
Is this branch (https://github.com/trilinos/Trilinos/blob/develop/packages/teuchos/comm/src/Teuchos_CommHelpers.cpp#L647) exercised by BlankRowBugTest_MPI_2 on other machines using spectrum MPI?

CC: @bartlettroscoe.

mhoemmen commented 4 years ago

@e10harvey Rerun with environment variable TPETRA_DEBUG set to 1.

bartlettroscoe commented 4 years ago

NOTE: We can't really support Trilinos on ATS-2 until we work this out as we have about 270 failing tests which is too much for us to even start to monitor with our current system.

I am sure there is something to do with how we are launching the job with LFS bsub or using mpiexec. It might even be related to an issue that @jjellio reported a while back in #5855 about -disable_gpu_hooks ? (But have not had time to investigate myself.)

mhoemmen commented 4 years ago

@e10harvey One possibility is that someone is passing UVM memory down through Tpetra or through Teuchos::*send.

vbrunini commented 4 years ago

On vortex Tpetra will by default detect that MPI is cuda aware and try to use cuda buffers. This doesn't actually work unless you pass -M "gpu" to jsrun. Alternatively you can set TPETRA_ASSUME_CUDA_AWARE_MPI=0 in the environment to override the default behavior and use host MPI.

e10harvey commented 4 years ago

Thanks everyone! Passing -gpu, instead of -disable_gpu_hooks, to mpiexec also resolves this.

jjellio commented 4 years ago

I'd start by either disabling Cuda aware Tpetra export TPETRA_ASSUME_CUDA_AWARE_MPI=0 in the environment, or adding -M -gpu to the MPI NP arg variable in Cmake.

The other issue that will cause many unit tests to fail is that unless they call MPI Init, they will segfault. (You need to use #5855 to disable the GPU shared library hooks in IBM Spectrum).

To simply get things going, what you could do: 1) disable Tpetra-cuda aware 2) disable the gpu hooks

(without Cuda + MPI you don't need the GPU hooks, so you can run everything without - I believe). It's been a while since I've mucked with the unit testing stuff, so some of this could have changed.

jjellio commented 4 years ago

It's also worth adding, for testing purposes, you really want to test both Cuda-aware MPI and non-cuda-aware MPI. E.g., run with TPETRA_ASSUME_CUDA_AWARE_MPI=0 and TPETRA_ASSUME_CUDA_AWARE_MPI=1 + jsrun -M -gpu. That exercises different code paths with respect to MPI.

bartlettroscoe commented 4 years ago

It's also worth adding, for testing purposes, you really want to test both Cuda-aware MPI and non-cuda-aware

@jjellio, what are the actual ATDM APP codes going to be using? What is SPARC and EMPIRE going to be doing on ATS-2? Are they using CUDA-aware MPI or not? That is all we care about from an ATDM perspective. (Other customers may care but they need to be setting up their own Trilinos testing the way that ATDM is doing if their use cases are different from ATDM's.)

@sebrowne, @mhoemmen, what is SPARC doing on ATS-2? Is SPARC using CUDA-aware MPI functionality?

@rppawlo, @bathmatt, what is EMPIRE doing on ATS-2? Is EMPIRE using CUDA-aware MPI functionality?

jjellio commented 4 years ago

I believe everyone is using Cuda-aware.

I know various devs have tried with and without because the point is really to find the most performant code path and use it (so this was a free variable. Until recently we got better performance using non-cuda-aware MPI, and that could flip flop, who knows... )

bartlettroscoe commented 4 years ago

I believe everyone is using Cuda-aware.

@jjellio, how can we verify this? Surely there are @trilinos/tpetra or @trilinos/muelu, or other Trilinos developers with ATDM funding working with SPARC and EMPIRE who would know this off of the top of their heads, no?

bartlettroscoe commented 4 years ago

I am going to re-open this for now until we get a clear answer to the question of of CUDA-aware MPI or not.

mhoemmen commented 4 years ago

@bartlettroscoe SPARC was using CUDA-aware MPI. In any case, I would like to see GPU builds with this feature both enabled and disabled. That would give complete test coverage.

bartlettroscoe commented 4 years ago

I would like to see GPU builds with this feature both enabled and disabled. That would give complete test coverage.

@mhoemmen, since this can be changed at runtime, it would be good if the Tpetra test suite itself could add tests for host MPI and CUDA-aware MPI, with the same executables. That would save us a ton of build time for the GPU. Otherwise, I guess we could rig things up to rerun the Trilinos test suite for an existing CUDA build to switch between host MPI and CUDA-aware MPI without having to build anything new.

Are there any @trilinos/tpetra developers who feel strongly enough about this testing on ATS-2 to help us set up this type of testing and get the entire Trilinos test suite cleaned up for both host MPI and CUDA-aware MPI?

jjellio commented 4 years ago

@bartlettroscoe

How would a dev add a test like this? Toggling between the two requires an ENV be set, does the unit test setup enable devs to specify specific ENVs for a specific test?

When I coerced ctest to do this, I simply exported the ENV one way, then ran ctest, then changed the ENV and reran it. There are probably more complicated ways to go about this, but the easiest way would be to run ctest twice with the ENV toggled in between. (requires no dev effort, but requires a custom test script for ATS2)

bartlettroscoe commented 4 years ago

How would a dev add a test like this? Toggling between the two requires an ENV be set, does the unit test setup enable devs to specify specific ENVs for a specific test?

@jjellio, you can set the env on a test-by-test basis in the CMakeLists.txt file. See the the ENVIRONMENT option to TRIBITS_ADD_TEST() AND TRIBITS_ADD_ADVANCED_TEST(). Some simple CMake wrapper functions used in Tpetra would make this easy to do, I think.

But I guess if any Trilinos packages could be calling raw MPI commands, it would just be better to run the entire test suite twice, once with TPETRA_ASSUME_CUDA_AWARE_MPI=1 and one with TPETRA_ASSUME_CUDA_AWARE_MPI=0 or with different arguments passed to mpiexex or jsrun? What exactly are you suggesting? Can we make the entire Trilinos test suite pass either way? Issues like #5855 make me worry about that.

But since this is a critical platform, we need to find a way to get the entire Trilinos suite to pass with the use cases that our customers need.

jjellio commented 4 years ago

For certain machines, the easiest thing to do would be to enable:

configure
build
export TPETRA_ASSUME_CUDA_AWARE_MPI=0
ctest
Can this be posted to a dashboard? (E.g., build_id+non-cuda-aware)
export TPETRA_ASSUME_CUDA_AWARE_MPI=1
ctest
Can this be posted to a dashboard? (E.g., build_id+cuda-aware)

Finish

The gpu_hooks thing could be worked around using a jsrun wrapper. E.g., checked into Trilinos, something that takes takes the entire JSRUN command, finds the '-p N', if N is 1, then append a disable gpu_hooks flag

Then forward all of the args to the real jsrun.

mhoemmen commented 4 years ago

@bartlettroscoe We've seen errors downstream in packages like Zoltan2, where developers pass e.g., UVM allocations to MPI without realizing it. What we really want to do is run all of the Tpetra tests (and in fact, all downstream packages' tests, not just Tpetra) with both options. I don't think just having a few tests that turn the option on or off would be quite as useful.

bartlettroscoe commented 4 years ago

@jjellio, @mhoemmen, I think we can rig up something like what is suggested.

Can someone contribute a jsrun wrapper or mpiexec wrapper that that allows the entire Trilinos test suite to pass on 'vortex' with the new ATDM Trilinos 'ats2' env?

bartlettroscoe commented 4 years ago

BTW, building on 'vortex' should be as easy as is described in:

https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#quick-start

Just have your Trilinos clone, create a build directory and then do:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh \
     Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt

$ cmake \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Tpetra=ON \
  $TRILINOS_DIR

$ make -j16

The only difference is that Ninja is not part of the standard ATS-2 stack so we have to use standard makefiles for now. (But we could install it easy enough and add it to the env.)

@jjellio, do you have time to help with this?

rppawlo commented 4 years ago

@bartlettroscoe - in the link you point to above, there is a section "Specific instructions for each system" that does not have anything for vortex. Can you add the instructions you just posted to the section? I tried the load-env.sh with a target of "cuda-gnu" yesterday and the script crashed immediately. After some hacking I found a list of options that configured and compiled until a link was attempted. It failed to find some basic libraries during linking. For vortex it looks like we need the exact magic string "Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt" you mention above.

bartlettroscoe commented 4 years ago

@rppawlo, for the Trilinos 'develop' version:

b3449a0931 "Merge Pull Request #6693 from lucbv/Trilinos/Tpetra_matvec_benchmark"
Author: trilinos-autotester <trilinos@sandia.gov>
Date:   Thu Jan 30 20:58:10 2020 -0700 (12 hours ago)

on 'vortex' I just tried:

$ hostname
vortex60

$ . cmake/std/atdm/load-env.sh cuda-gnu-opt
Hostname 'vortex60' matches known ATDM host 'vortex' and system 'ats2'
Setting compiler and build options for build-name 'cuda-gnu-opt'
Using ats2 compiler stack CUDA-10.1.243_GNU-7.3.1_SPMPI-2019.06.24 to build RELEASE code with Kokkos node type CUDA

What version of Trilinos are you using? What command are you running and what output are you seeing?

I am running the checkin-test-atdm.sh script right now:

$  ./checkin-test-atdm.sh cuda-gnu-opt cuda-gnu-opt --enable-packages=Tpetra --configure --build

on 'vortex60' and it seems to be building okay. I will make sure I can reproduce the test results shown on CDash locally.

NOTE: we have 4 'ats2' builds running on 'vortex' a shown here.

Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt
Trilinos-atdm-ats2-gnu-7.3.1-spmpi-2019.06.24_serial_static_dbg
Trilinos-atdm-ats2-gnu-7.3.1-spmpi-2019.06.24_serial_static_opt

rppawlo commented 4 years ago

I used develop from last night. I'm having trouble replicating the script crash this morning. Maybe my environment was messed up? The link issue is due to compiling on the vortex login node as opposed to an interactive shell. I was getting:

ld: cannot find -lz
ld: cannot find -ldl
ld: cannot find -lmpiprofilesupport
ld: cannot find -lrt
ld: cannot find -lpthread
ld: cannot find -ldl

bartlettroscoe commented 4 years ago

@rppawlo, are you still having trouble building on 'vortex'? If so, we should compare our envs from login and command run step by step and see where the problem lies.

jjellio commented 4 years ago

Ross,

I wrote a wrapper that may work. I didn't handle the long version of the '-M' argument, but adding that would be easy.

Supposing we polish the script up, I guess whomever maintains the ENV's for ATS2 could let it live there. Another option would be to talk with the testbed teams to have it added as a module. Letting it live in the Trilinos code base is a better option imo, as it would enable people on non-SNL machines to use it.

Tested w/hostname

[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun --np=1 hostname
jsrun -M -disable_gpu_hooks --np=1 hostname
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun --np=2 hostname
jsrun --np=2 hostname
vortex2
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p1 hostname
jsrun -M -disable_gpu_hooks -p1 hostname
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p 1 hostname
jsrun -M -disable_gpu_hooks -p 1 hostname
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p 2 hostname
jsrun -p 2 hostname
vortex2
vortex2
[jjellio@vortex59 jsrun_wrapper]$ ./trilinos_jsrun -p2 hostname
jsrun -p2 hostname
vortex2
vortex2

#!/bin/bash

np_is_one="false"

args=("$@")

for i in "${!args[@]}";
do
  arg=${args[${i}]}
  # handle the case of seeing the np argument
  # lets assume -p ## or --np=##
  if [[ "$arg" =~ ^-p[0-9]+$ ]]; then
    # np is just the 2nd arg split on 'p'
    my_np=$(echo "$arg" | cut -f2 -d'p');
    if [ "$my_np" == "1" ]; then
      np_is_one="true";
    fi
    # we are finished
    break;
  # next, -p ##
  elif [ "$arg" == "-p" ]; then
    ip1=$(($i+1));
    my_np=${args[${ip1}]}
    if [ "$my_np" == "1" ]; then
      np_is_one="true";
    fi
    # we are finished
    break;
  # next, --np=##
  elif [[ "$arg" =~ ^--np=[0-9]+$ ]]; then
    # np is just the 2nd arg split on '='
    my_np=$(echo "$arg" | cut -f2 -d'=');
    if [ "$my_np" == "1" ]; then
      np_is_one="true";
    fi
    # we are finished
    break;
  fi
done

if [ "$np_is_one" == "true" ]; then
  added_disable="false"
  # look for an '-M' arg and append disable hooks to the next argument
  for i in "${!args[@]}";
  do
    arg=${args[${i}]}
    if [ "$arg" == "-M" ]; then
      ip1=$(($i+1));
      args[${ip1}]="'${args[${ip1}]} -disable_gpu_hooks'"
      added_disable="true"
      break
    fi
  done

  if [ "$added_disable" == "false" ]; then
    # prepend the argument
    args=("-M -disable_gpu_hooks" "${args[@]}")
  fi
fi

# comment this out after debugging
echo "jsrun ${args[@]}"
eval jsrun ${args[@]}

Edit: I fixed a bug where I wasn't handling the -M flag correctly (needed bash variable protection)

bartlettroscoe commented 4 years ago

@jjellio, I say to give us control to work out issues, for now, we should put this in the directory:

Trilinos/cmake/std/atdm/ats2/trilinos_jsrun

and then we should update:

Trilins/cmake/std/atdm/ats2/enviornment.sh

to use this script.

I think you want to add:

export ATDM_CONFIG_MPI_EXEC=${ATDM_SCRIPTS_DIR}/ats2/trilinos_jsrun

right above this line that sets ATDM_CONFIG_MPI_POST_FLAGS:

https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/ats2/environment.sh#L245

and then modify that line to do what you want. You can set the env vars

ATDM_CONFIG_MPI_PRE_FLAGS
ATDM_CONFIG_MPI_POST_FLAGS

to customize this however.

Otherwise, @e10harvey, do you have time to work with @jjellio on this and try this out?

jjellio commented 4 years ago

Yes.

MPIEXEC will be this script the '-np' flag should be '-p'

There is another issue.

Supposing you want to test both cuda-aware and regular MPI. This entails:

setting TPETRA_ASSUME_CUDA_AWARE_MPI
if TPETRA_ASSUME_CUDA_AWARE_MPI=1, then we also want -M -gpu appended to jsrun.

One way to do this would be to modify the wrapper, and if it detects TPETRA_ASSUME_CUDA_AWARE_MPI=1, then it would append -gpu to the -M argument

bartlettroscoe commented 4 years ago

MPIEXEC will be this script the '-np' flag should be '-p'

@jjellio, okay, we can add a hook for that (which is just adding another ATDM_CONFIG_ env var set on ats2/environment.sh that we pick up in ATDMDevEnvSettings.cmake).

rppawlo commented 4 years ago

@rppawlo, are you still having trouble building on 'vortex'?

@bartlettroscoe - it is working fine now. Thanks!

jjellio commented 4 years ago

@e10harvey I just read through the whole issue (I had skipped to the point where Ross mentioned me).

The short is that -disable_gpu_hooks is something that needs to be applied to single process runs (by default spectrum loads a shared library that hooks all kinds of things, but those hooks depend on MPI_Init being called to initialize things).

The goal of this effort, which probably should be moved to a specific issue, is that Trilinos needs to have atleast two code paths tested: 1) Tpetra w/cuda-aware, which requires -M -gpu, and 2) Tpetra w/out cuda-aware which requires not having -M -gpu (I've yet to hear what the flag is to specifically disable GPU aware MPI (there is only an option to enable it).

All of the above are runtime options. When you invoke jsrun without -M -gpu, you are getting an entirely different implementation of MPI.

The wrapper I've proposed would enable Trilinos to configure with a fake jsrun that magically handles the -M -gpu flag based on whether Tptera's cuda-aware ENV is set to 0 or 1. Also, the wrapper automatically disables the GPU hooks if the process is not MPI-parallel.

I played the game of Vortex unit testing over the summer of 2019. I seem to remember that I was able to get most tests passing, but it required substantial find/replace on the generated ctest files.

To test vortex with both cuda-aware and regular MPI, the testing harness will need to do something like:

configure w/wrapped jsrun (via ATDM ENV's)
build once
set TPETRA_ASSUME_CUDA_AWARE_MPI=0 (do the regular MPI case)
invoke ctest
post the results to a dashboard
do something to change the 'build_id' of the ctest file. e.g., Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg to Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_cuda-aware_static_dbg
set TPETRA_ASSUME_CUDA_AWARE_MPI=1
reset ctest (clear all pass/fail stuff)
invoke ctest
post the results to a dashboard, they will get attributed to the modified build_id mentioned in (6)

I'll need to add the logic to set -M -gpu to the wrapped jsrun, and the testing harness will need to figure out how to do 6-10.

It's worth noting, this same testing workflow could be implemented on White/Ride/Waterman. On those machines the MPI impl is effectively running with the openMPI variant of -M -gpu. So on those machines mpirun would need to be wrapped to disable cuda-aware MPI.

e10harvey commented 4 years ago

Otherwise, @e10harvey, do you have time to work with @jjellio on this and try this out?

Certainly. @jjellio: I will try this out using what everyone has shared in this issue and follow-up once that's done. At first I was thinking that this spectrum MPI version should be updated s.t. these tests pass (and maybe it still needs to be for other reasons) but bottom line is that we need to test both the cuda-aware MPI and non-cuda-aware MPI paths.

jjellio commented 4 years ago

I am adding the logic so the wrapper will detect the TPETRA variable and set -gpu now.

You can still use the wrapper above, it only handles the single process gpu_hooks business right now.

e10harvey commented 4 years ago

I am adding the logic so the wrapper will detect the TPETRA variable and set -gpu now.

@jjellio: Would you please open a PR with the updated wrapper and flag me in it? If you just put the wrapper in Trilinos/cmake/std/atdm/ats2/trilinos_jsrun I'll try to take care of the rest.

bartlettroscoe commented 4 years ago

but bottom line is that we need to test both the cuda-aware MPI and non-cuda-aware MPI paths.

@e10harvey, I have ideas about how we can set up some Jenkins jobs that use the same build to run the test suite both ways. Once we can manually test that this works, then we can get automated builds in place that run these tests cases efficiently and post to CDash.

e10harvey commented 4 years ago

See https://github.com/trilinos/Trilinos/pull/6718/files#r374869115.

e10harvey commented 3 years ago

Thanks, all!

bartlettroscoe commented 3 years ago

@e10harvey, what is the evidence this is fixed?

e10harvey commented 3 years ago

@e10harvey, what is the evidence this is fixed?

I see no evidence of the Seg Fault on CDASH over the past 2 weeks.

trilinos / Trilinos

Tpetra: BlankRowBugTest_MPI_2 Seg faulting on vortex #6691

Question