trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 566 forks source link

Panzer/Tpetra: Export from ghosted graph to empty graph reports insufficient capacity #6163

Closed kddevin closed 4 years ago

kddevin commented 5 years ago

Bug Report

@trilinos/tpetra @trilinos/panzer @rppawlo @tjfulle @bathmatt User's issue-tracking number 882 Resolution needed before merging tpetra-remove-deprecated branch.

Description

Although all panzer tests pass with Tpetra_ENABLE_DEPRECATED_CODE=OFF, a panzer user is experiencing problems. The user sees

p=0 | /home//Trilinos/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2299:
p=0 | 
p=0 | Throw number = 1
p=0 | 
p=0 | Throw test that evaluated to true: (numInserted == Teuchos::OrdinalTraits<size_t>::invalid())
p=0 | 
p=0 | Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalIndicesImpl: There is not enough capacity to insert indices in to row 71. The upper bound on the number of entries in this row must be increased to accommodate one or more of the new indices.
p=0 | 

The stack trace is

#0  __cxxabiv1::__cxa_throw (obj=0xa2ba9b0, 
    tinfo=0x8e6c290 <typeinfo for std::runtime_error>, 
    dest=0x459d10 <_ZNSt13runtime_errorD1Ev@plt>)
    at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:75
#1  0x0000000003b981de in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalIndicesImpl(Tpetra::RowInfo const&, long long const*, unsigned long, std::function<void (unsigned long, unsigned long, unsigned long)>) ()
#2  0x0000000003b9825d in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalIndicesImpl(int, long long const*, unsigned long) ()
#3  0x0000000003baedb4 in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalIndicesFiltered(int, long long const*, int) ()
#4  0x0000000003bb71fb in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::unpackAndCombine(Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<long long*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void>, Kokkos::DualView<unsigned long*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void>, unsigned long, Tpetra::Distributor&, Tpetra::CombineMode) ()
#5  0x0000000003b0dda7 in Tpetra::DistObject<long long, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransfer---Type <return> to continue, or q <return> to quit---
New(Tpetra::SrcDistObject const&, Tpetra::CombineMode, unsigned long, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Tpetra::Distributor&, Tpetra::DistObject<long long, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, bool, bool) ()
#6  0x0000000003af42a6 in Tpetra::DistObject<long long, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransfer(Tpetra::SrcDistObject const&, Tpetra::Details::Transfer<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, char const*, Tpetra::DistObject<long long, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, Tpetra::CombineMode, bool) ()
#7  0x0000000003af01f3 in Tpetra::DistObject<long long, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doExport(Tpetra::SrcDistObject const&, Tpetra::Export<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Tpetra::CombineMode, bool) ()
#8  0x00000000016463b1 in panzer::BlockedTpetraLinearObjFactory<panzer::Traits, double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, ---Type <return> to continue, or q <return> to quit---
Kokkos::HostSpace> >::buildTpetraGraph(int, int) const ()
#9  0x00000000016594c4 in panzer::BlockedTpetraLinearObjFactory<panzer::Traits, double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::getGraph(int, int) const ()
#10 0x0000000001648e50 in panzer::BlockedTpetraLinearObjFactory<panzer::Traits, double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::getTpetraMatrix(int, int) const ()
#11 0x00000000015a07d7 in panzer::L2Projection::buildMassMatrix(bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, double> > > const*) ()

This part of panzer is exporting from a ghosted graph to an empty graph.

I have asked @rppawlo to provide some details about how the user's case differs from the panzer tests so that we might construct a reproducer and regression test.

Steps to Reproduce

running through EMPIRE on a workstation (nothing too fancy -- OpenMP node) Tpetra_ENABLE_DEPRECATED_CODE=OFF

mperego commented 5 years ago

@kddevin I'm seeing the same in Albany for a very small graph. Not sure what's the difference with other graphs that are exported fine.

tjfulle commented 5 years ago

@mperego @rppawlo is the failure occurring from the target of an import/export operation? If so, this is a bug since we promise resize data of the target of a import/export to accommodate entries from other processes (that the user presumably did not know about). If the failure occurs during user directed insertion, this failure is expected.

mperego commented 5 years ago

@tjfulle Yes, the failure occurs from exporting a graph. I can point you to the test in Albany, or I can help providing you the info you'd need.

tjfulle commented 5 years ago

@mperego, ok. If we can come up with a standalone test, that would be very helpful. If not, an Albany test could work, though it may be a little heavy on compile time. Why don't you email me at my Sandia.gov email and we can coordinate a test case?

rppawlo commented 5 years ago

@tjfulle - it is occurring in the target of an import/export operation in empire as well. What's surprising is that this code is used for all graph construction in panzer. I would have expected many more tests to be failing unless it is using an estimated size inside that works for most cases. Don't have a simple reproducer.

tjfulle commented 5 years ago

Thanks @rppawlo. During import/export we count the number of incoming entries and resize the target appropriately. There must be edge cases where we don't get the right count. Hopefully the failing Albany test will lead to a fix for both codes. I'll keep you posted

mperego commented 5 years ago

This means that when we create the empty graph it's probably better to set row_nonzeros=0, and let the exporter resize it, right?

mhoemmen commented 5 years ago

@mperego That should reduce the number of (re)allocations, yes.

tjfulle commented 5 years ago

@mperego, that is the idea. Do the failing tests all have an empty target? That could be a case we hadn't considered.

mperego commented 5 years ago

@tjfulle No, that test fails independently of whether the graph is constructed with 0 row nonzeros, or with the nonzeros of the overlap graph.

kddevin commented 5 years ago

@mperego : I concur with @tjfulle; it would be great to get your inputs so that we can create a small reproducer. Thanks!

tjfulle commented 5 years ago

@kddevin et. al., @mperego provided me with Trilinos and Albany configure scripts and the path to a failing Albany test. I'm starting on that and hoping it resolves the EMPIRE issue as well

rppawlo commented 5 years ago

@tjfulle - any progress on this issue? Any way I can help?

tjfulle commented 5 years ago

@rppawlo - it's slow going, but I'm chipping away. I haven't found the cause, but I think I may be close. The biggest hassle is not having a small reproducer and having to debug Tpetra through Albany. If I can't find the cause by this afternoon I'll set up a meeting with with the several people impacted to solicit ideas/help.

tjfulle commented 5 years ago

I have a work around that fixes the Albany failure. It's a bit of a bandaid, but I'm close to having a solution to the root cause. PR should be short coming

tjfulle commented 5 years ago

PR in. I'd be interested to know if it fixes @rppawlo's issues.

mhoemmen commented 5 years ago

@tjfulle I approved the PR, but have some concerns about UVM assumptions -- see comments. Thanks!

rppawlo commented 5 years ago

@tjfulle - Thanks! This fixed many, but not all of the empire issues. Still seeing this in a handful of tests:

*********** Caught Exception: Begin Error Report ***********
/home/rppawlo/Trilinos/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2303:

Throw number = 1

Throw test that evaluated to true: (numInserted == Teuchos::OrdinalTraits<size_t>::invalid())

Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalIndicesImpl: There is not enough capacity to insert 132 indices in to row 30 on rank 0.The current size of the indices array is 31824.  The current size of the row p\
ointers array is 307.  The upper bound on the number of entries in this row must be increased to accommodate one or more of the new indices.
************ Caught Exception: End Error Report ************

I'll rebuild in debug and post a stack trace. If this would be easier to debug directly in empire, I'd be happy to connect over skype for a pair programming session.

tjfulle commented 5 years ago

Are those failures also during export/import, or are you doing the insertions explicitly?

rppawlo commented 5 years ago

Are those failures also during export/import, or are you doing the insertions explicitly?

Here's the backtrace. Looks like its in the TwoMatrixAdd. Maybe this is fixed in #6190 ? I'll pull in that branch and see if that fixes things. Also seeing a hang in one test as reported in #6237 .

Catchpoint 1 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x67dc9a0, tinfo=0x7fffd0adda70 <typeinfo for std::runtime_error>, dest=0x7fffd0801360 <std::runtime_error::~runtime_error()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:75
75  ../../.././libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory.
Missing separate debuginfos, use: debuginfo-install blas-3.4.2-8.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 lapack-3.4.2-8.el7.x86_64 libcom_err-1.42.9-16.el7.x86_64 libcurl-7.29.0-54.el7.x86_64 libgfortran-4.8.5-39.el7.x86_64 libibverbs-22.1-3.el7.x86_64 libidn-1.28-4.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-22.1-3.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libssh2-1.8.0-3.el7.x86_64 nspr-4.21.0-1.el7.x86_64 nss-3.44.0-4.el7.x86_64 nss-softokn-freebl-3.44.0-5.el7.x86_64 nss-util-3.44.0-3.el7.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.4-21.el7.x86_64
(gdb) bt
#0  __cxxabiv1::__cxa_throw (obj=0x67dc9a0, tinfo=0x7fffd0adda70 <typeinfo for std::runtime_error>, dest=0x7fffd0801360 <std::runtime_error::~runtime_error()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:75
#1  0x00007fffd6bf56ca in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalIndicesImpl(Tpetra::RowInfo const&, long long const*, unsigned long, std::function<void (unsigned long, unsigned long, unsigned long)>) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#2  0x00007fffd69dc2f3 in Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalValuesImpl(Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >&, Tpetra::RowInfo&, long long const*, double const*, unsigned long) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#3  0x00007fffd69dc797 in Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::insertGlobalValues(long long, Teuchos::ArrayView<long long const> const&, Teuchos::ArrayView<double const> const&) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#4  0x00007fffd732a13b in void Tpetra::MatrixMatrix::Add<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >(Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, bool, double, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, bool, double, Teuchos::RCP<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetraext.so.12
#5  0x00007fffe9ea6c35 in Xpetra::MatrixMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::TwoMatrixAdd(Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, bool, double const&, Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, bool, double const&, Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >&, Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, bool) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#6  0x00007fffe9ec2c30 in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::formCoarseMatrix() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#7  0x00007fffe9ec423f in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::compute(bool) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#8  0x00007fffe9ed370a in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::resetMatrix(Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, bool) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#9  0x00007fffece3f6cc in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::RefMaxwell(Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > const&, Teuchos::ParameterList&, bool) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libpanzer-stk.so.12
#10 0x00007fffece83d10 in Thyra::MueLuRefMaxwellPreconditionerFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::initializePrec(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Thyra::PreconditionerBase<double>*, Thyra::ESupportSolveUse) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libpanzer-stk.so.12
#11 0x00007fffe355c0d9 in Thyra::BelosLinearOpWithSolveFactory<double>::initializeOpImpl(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Teuchos::RCP<Thyra::PreconditionerBase<double> const> const&, bool, Thyra::LinearOpWithSolveBase<double>*, Thyra::ESupportSolveUse) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#12 0x00007fffe355ce8b in Thyra::BelosLinearOpWithSolveFactory<double>::initializeOp(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Thyra::LinearOpWithSolveBase<double>*, Thyra::ESupportSolveUse) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#13 0x00007ffff2f3f340 in Teuchos::RCP<Thyra::LinearOpWithSolveBase<double> > Thyra::linearOpWithSolve<double>(Thyra::LinearOpWithSolveFactoryBase<double> const&, Teuchos::RCP<Thyra::LinearOpBase<double> const> const&, Thyra::ESupportSolveUse) ()
   from /home/rppawlo/empire2_opt/src/circuit/libcircuit.so
#14 0x00007ffff2530839 in empire::ElectroMagneticSolverInterface::setupSchurComplementSolve(Teuchos::RCP<Thyra::LinearOpBase<double> > const&, Teuchos::RCP<Thyra::LinearOpWithSolveFactoryBase<double> const> const&, Teuchos::ParameterList const&) const ()
   from /home/rppawlo/empire2_opt/src/em_solvers/libem_solvers.so
#15 0x00007ffff254887d in empire::ElectroMagneticSolverInterface::ElectroMagneticSolverInterface(MainParameterLists, empire::MeshContainer, empire::utils::TimeStamp&, bool, Teuchos::RCP<empire::utils::MeshEvaluationBase>) () from /home/rppawlo/empire2_opt/src/em_solvers/libem_solvers.so
#16 0x00000000005b57f8 in void empire::em::meshSpecificMain<empire::MeshTraits<shards::Hexahedron<8u>, 1> >(Teuchos::RCP<Teuchos::StackedTimer>&, Teuchos::RCP<Teuchos::MpiComm<int> const> const&, double, MainParameterLists&, empire::MeshContainer&, empire::utils::TimeStamp&) ()
#17 0x000000000048435c in main ()
(gdb) 
mhoemmen commented 5 years ago

Did the PR land that makes sparse matrix-matrix add call fillComplete first before returning?

rppawlo commented 5 years ago

Did the PR land that makes sparse matrix-matrix add call fillComplete first before returning?

Are you talking about #6190 ? It failed PR testing last night. I pulled that in. With #6190 and @tjfulle 's fix in this PR, we are down to two failures in all of empire:

  1. One test that is hanging. Might be same as reported in #6237 .
  2. Also seeing this in one single test in parallel:
    
    *********** Caught Exception: Begin Error Report ***********
    /home/rppawlo/Trilinos/packages/amesos2/src/Amesos2_KLU2_def.hpp:277:

Throw number = 1

Throw test that evaluated to true: info > 0

KLU2 numeric factorization failed **** Caught Exception: End Error Report ****

rppawlo commented 5 years ago

stack trace for amesos2 throw. Looks like it is happening in MueLu. @cgcgcg @jhux2

Catchpoint 1 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x16d7c90, tinfo=0x7f7d055fca70 <typeinfo for std::runtime_error>, dest=0x7f7d05320360 <std::runtime_error::~runtime_error()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:75
75  ../../.././libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory.
(gdb) bt
#0  __cxxabiv1::__cxa_throw (obj=0x16d7c90, tinfo=0x7f7d055fca70 <typeinfo for std::runtime_error>, dest=0x7f7d05320360 <std::runtime_error::~runtime_error()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:75
#1  0x00007f7d10a31cd8 in Amesos2::KLU2<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::numericFactorization_impl() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libamesos2.so.12
#2  0x00007f7d10a41339 in Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve(Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libamesos2.so.12
#3  0x00007f7d10a174c6 in Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libamesos2.so.12
#4  0x00007f7d1df7eabb in MueLu::Amesos2Smoother<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::Apply(Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >&, Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, bool) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu.so.12
#5  0x00007f7d1e070ea7 in MueLu::DirectSolver<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::Apply(Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >&, Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, bool) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu.so.12
#6  0x00007f7d1e0f162a in MueLu::Hierarchy<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::Iterate(Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >&, MueLu::Hierarchy<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ConvData, bool, int) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu.so.12
#7  0x00007f7d1f01ca8b in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::applyInverseAdditive(Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >&) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#8  0x00007f7d1f01e7eb in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::apply(Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Xpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >&, Teuchos::ETransp, double, double) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#9  0x00007f7d21fc0bb2 in Thyra::XpetraLinearOp<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::applyImpl(Thyra::EOpTransp, Thyra::MultiVectorBase<double> const&, Teuchos::Ptr<Thyra::MultiVectorBase<double> > const&, double, double) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libpanzer-stk.so.12
#10 0x00007f7d1861bd64 in Belos::OperatorTraits<double, Thyra::MultiVectorBase<double>, Thyra::LinearOpBase<double> >::Apply(Thyra::LinearOpBase<double> const&, Thyra::MultiVectorBase<double> const&, Thyra::MultiVectorBase<double>&, Belos::ETrans) [clone .constprop.1244] () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#11 0x00007f7d186f1e63 in Belos::LinearProblem<double, Thyra::MultiVectorBase<double>, Thyra::LinearOpBase<double> >::applyRightPrec(Thyra::MultiVectorBase<double> const&, Thyra::MultiVectorBase<double>&) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#12 0x00007f7d187094b9 in Belos::CGIter<double, Thyra::MultiVectorBase<double>, Thyra::LinearOpBase<double> >::initializeCG(Belos::CGIterationState<double, Thyra::MultiVectorBase<double> >&) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#13 0x00007f7d186d44a4 in Belos::PseudoBlockCGSolMgr<double, Thyra::MultiVectorBase<double>, Thyra::LinearOpBase<double>, true>::solve() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#14 0x00007f7d18619601 in Thyra::BelosLinearOpWithSolve<double>::solveImpl(Thyra::EOpTransp, Thyra::MultiVectorBase<double> const&, Teuchos::Ptr<Thyra::MultiVectorBase<double> > const&, Teuchos::Ptr<Thyra::SolveCriteria<double> const>) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#15 0x00007f7d28b7aa1f in empire::ElectroMagneticSolverInterface::applyJacobianInverse(Teuchos::Ptr<Thyra::MultiVectorBase<double> > const&, Thyra::MultiVectorBase<double> const&) const () from /home/rppawlo/empire2_opt/src/em_solvers/libem_solvers.so
#16 0x00000000004cc947 in empire::LinearSolverWrapper::solveImpl(Thyra::EOpTransp, Thyra::MultiVectorBase<double> const&, Teuchos::Ptr<Thyra::MultiVectorBase<double> > const&, Teuchos::Ptr<Thyra::SolveCriteria<double> const>) const ()
#17 0x00007f7d1c7b731c in NOX::Thyra::Group::applyJacobianInverseMultiVector(Teuchos::ParameterList&, Thyra::MultiVectorBase<double> const&, Thyra::MultiVectorBase<double>&) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libnox.so.12
#18 0x00007f7d1c77570c in NOX::Solver::SingleStep::try_step() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libnox.so.12
#19 0x00007f7d1c775beb in NOX::Solver::SingleStep::step() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libnox.so.12
#20 0x00007f7d1c777bd9 in NOX::Solver::SingleStep::solve() () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libnox.so.12
#21 0x00007f7d1c7c5f4c in Thyra::NOXNonlinearSolver::solve(Thyra::VectorBase<double>*, Thyra::SolveCriteria<double> const*, Thyra::VectorBase<double>*) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libnox.so.12
#22 0x00007f7d1fad5d8b in Tempus::StepperImplicit<double>::solveImplicitODE(Teuchos::RCP<Thyra::VectorBase<double> > const&, Teuchos::RCP<Thyra::VectorBase<double> > const&, double, Teuchos::RCP<Tempus::ImplicitODEParameters<double> > const&) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtempus.so.12
#23 0x00007f7d1fa9dcdb in Tempus::StepperBackwardEuler<double>::takeStep(Teuchos::RCP<Tempus::SolutionHistory<double> > const&) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtempus.so.12
#24 0x0000000000568b9d in empire::TimeStepExplicit<empire::MeshTraits<shards::Hexahedron<8u>, 1> >::takeStep(empire::utils::TimeStamp) ()
#25 0x0000000000674581 in void meshSpecificMain<empire::MeshTraits<shards::Hexahedron<8u>, 1> >(Teuchos::RCP<Teuchos::StackedTimer>&, Teuchos::RCP<Teuchos::MpiComm<int> const> const&, double, MainPicParameterLists&, bool, empire::MeshContainer&, empire::utils::TimeStamp&) ()
#26 0x00000000004c373c in main ()
cgcgcg commented 5 years ago

@rppawlo Which EMPIRE test is this?

rppawlo commented 5 years ago

DualFeedEMInlineMesh-mpi-parallel

This is with this morning's develop with Tim's change, all deprecated code off in trilinos and pulling in #6190

mhoemmen commented 5 years ago

@rppawlo wrote:

Did the PR land that makes sparse matrix-matrix add call fillComplete first before returning?

Are you talking about #6190 ?

Yes -- thanks for testing!

rppawlo commented 5 years ago

In addition to the test above, the hanging test is also dumping an exception but not exiting cleanly. Traceback looks to be in a muelu call too. @cgcgcg this is OscillatingEField1DMultiBlock-parallel. Same build params as above.

*********** Caught Exception: Begin Error Report ***********
/home/rppawlo/Trilinos/packages/tpetra/core/src/Tpetra_Details_packCrsMatrix_def.hpp:884:

Throw number = 1

Throw test that evaluated to true: pack_pids && exports.extent (0) != 0 && export_pids.extent (0) == 0

Tpetra::Details::PackCrsMatrixImpl::packCrsMatrix: pack_pids is true, and exports.extent(0) = 536 != 0, meaning that we need to pack at least one matrix entry, but export_pids.extent(0) = 0.
************ Caught Exception: End Error Report ************
(gdb) bt
#0  __cxxabiv1::__cxa_throw (obj=0x89b16f0, tinfo=0x7f9e1b3b6a28 <typeinfo for std::invalid_argument>, dest=0x7f9e1b0da2a0 <std::invalid_argument::~invalid_argument()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:75
#1  0x00007f9e213bec1d in void Tpetra::Details::PackCrsMatrixImpl::packCrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace> >(Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Kokkos::DualView<char*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void>&, Kokkos::View<unsigned long*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Kokkos::View<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Kokkos::View<int const*, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace>::device_type> const&, unsigned long&, bool, Tpetra::Distributor&) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#2  0x00007f9e213c0ab4 in void Tpetra::Details::packCrsMatrixWithOwningPIDs<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >(Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Kokkos::DualView<char*, Tpetra::DistObject<char, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::buffer_device_type, void, void>&, Teuchos::ArrayView<unsigned long> const&, Teuchos::ArrayView<int const> const&, Teuchos::ArrayView<int const> const&, unsigned long&, Tpetra::Distributor&) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#3  0x00007f9e212d5de8 in Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::transferAndFillComplete(Teuchos::RCP<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >&, Tpetra::Details::Transfer<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Teuchos::RCP<Tpetra::Details::Transfer<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Teuchos::ParameterList> const&) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#4  0x00007f9e212dd8bf in Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::importAndFillComplete(Teuchos::RCP<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >&, Tpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Tpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Teuchos::RCP<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Teuchos::ParameterList> const&) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libtpetra.so.12
#5  0x00007f9e22e53dbd in Xpetra::TpetraCrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::TpetraCrsMatrix(Teuchos::RCP<Xpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Xpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Teuchos::RCP<Xpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>, Teuchos::RCP<Xpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Xpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Teuchos::ParameterList> const&) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libxpetra.so.12
#6  0x00007f9e3474c059 in Xpetra::MatrixFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::Build(Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Xpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Xpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Teuchos::RCP<Xpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Xpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const> const&, Teuchos::RCP<Teuchos::ParameterList> const&) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#7  0x00007f9e339900fc in MueLu::RebalanceAcFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::Build(MueLu::Level&, MueLu::Level&) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu.so.12
#8  0x00007f9e347b9f52 in MueLu::TwoLevelFactoryBase::CallBuild(MueLu::Level&) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#9  0x00007f9e37716600 in Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >& MueLu::Level::Get<Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, MueLu::FactoryBase const*) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libpanzer-stk.so.12
#10 0x00007f9e347a44fd in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::compute(bool) ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#11 0x00007f9e347b194a in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::resetMatrix(Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, bool) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libmuelu-adapters.so.12
#12 0x00007f9e3771d3bc in MueLu::RefMaxwell<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::RefMaxwell(Teuchos::RCP<Xpetra::Matrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > const&, Teuchos::ParameterList&, bool) () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libpanzer-stk.so.12
#13 0x00007f9e37761a00 in Thyra::MueLuRefMaxwellPreconditionerFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::initializePrec(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Thyra::PreconditionerBase<double>*, Thyra::ESupportSolveUse) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libpanzer-stk.so.12
#14 0x00007f9e2de3b0d9 in Thyra::BelosLinearOpWithSolveFactory<double>::initializeOpImpl(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Teuchos::RCP<Thyra::PreconditionerBase<double> const> const&, bool, Thyra::LinearOpWithSolveBase<double>*, Thyra::ESupportSolveUse) const () from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#15 0x00007f9e2de3be8b in Thyra::BelosLinearOpWithSolveFactory<double>::initializeOp(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Thyra::LinearOpWithSolveBase<double>*, Thyra::ESupportSolveUse) const ()
   from /home/rppawlo/install/kokkosdeprecated-gnu-opt-openmp-shared/lib/libstratimikosbelos.so.12
#16 0x00007f9e3d81d340 in Teuchos::RCP<Thyra::LinearOpWithSolveBase<double> > Thyra::linearOpWithSolve<double>(Thyra::LinearOpWithSolveFactoryBase<double> const&, Teuchos::RCP<Thyra::LinearOpBase<double> const> const&, Thyra::ESupportSolveUse) ()
   from /home/rppawlo/empire2_opt/src/circuit/libcircuit.so
#17 0x00007f9e3ce0e839 in empire::ElectroMagneticSolverInterface::setupSchurComplementSolve(Teuchos::RCP<Thyra::LinearOpBase<double> > const&, Teuchos::RCP<Thyra::LinearOpWithSolveFactoryBase<double> const> const&, Teuchos::ParameterList const&) const
    () from /home/rppawlo/empire2_opt/src/em_solvers/libem_solvers.so
#18 0x00007f9e3ce2687d in empire::ElectroMagneticSolverInterface::ElectroMagneticSolverInterface(MainParameterLists, empire::MeshContainer, empire::utils::TimeStamp&, bool, Teuchos::RCP<empire::utils::MeshEvaluationBase>) ()
   from /home/rppawlo/empire2_opt/src/em_solvers/libem_solvers.so
#19 0x00000000005addd8 in void empire::em::meshSpecificMain<empire::MeshTraits<shards::Quadrilateral<4u>, 1> >(Teuchos::RCP<Teuchos::StackedTimer>&, Teuchos::RCP<Teuchos::MpiComm<int> const> const&, double, MainParameterLists&, empire::MeshContainer&, empire::utils::TimeStamp&) ()
#20 0x00000000004846f0 in main ()
(gdb)
cgcgcg commented 5 years ago

DualFeedEMInlineMesh-mpi-parallel

This is with this morning's develop with Tim's change, all deprecated code off in trilinos and pulling in #6190

@rppawlo I rebuilt with everything but PR #6190, and the test seems to be passing. Could you check that on your end? Maybe we should wait until #6190 is done?

mperego commented 5 years ago

I also found a failing test in Albany (I had not enabled that module before) but it happens in Ifpack2 (@srajama1 @trilinos/ifpack2 ).:

p=0: *** Caught standard std::exception of type 'std::runtime_error' :

 /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2372:

 Throw number = 1

 Throw test that evaluated to true: (numInserted == Teuchos::OrdinalTraits<size_t>::invalid())

 Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertLocallIndicesImpl: There is not enough capacity to insert indices in to row 15. The upper bound on the number of entries in this row must be increased to accommodate one or more of the new indices.

gdb (where)

#0  __cxxabiv1::__cxa_throw (obj=0x5665ea0, tinfo=0x7fffd127aa70 <typeinfo for std::runtime_error>, dest=0x7fffd0f9e470 <std::runtime_error::~runtime_error()>)
    at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:76
#1  0x00007fffde8182b3 in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertLocalIndicesImpl(int, Teuchos::ArrayView<int const> const&, std::function<void (unsigned long, unsigned long, unsigned long)>) (this=0x7fffffff6740, myRow=15, indices=..., fun=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2367
#2  0x00007fffde828434 in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertLocalIndices (this=0x5666dd0, localRow=15, indices=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:3222
#3  0x00007fffe73f6826 in Ifpack2::IlukGraph<Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::initialize (this=0x5664dc0)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/ifpack2/src/Ifpack2_IlukGraph.hpp:478
#4  0x00007fffe73f845b in Ifpack2::RILUK<Tpetra::RowMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::initialize (this=0x56649f0)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/ifpack2/src/Ifpack2_RILUK_def.hpp:470
#5  0x00007fffe77a4c56 in Thyra::Ifpack2PreconditionerFactory<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::initializePrec (
    this=0x7fffffff73b0, fwdOpSrc=..., prec=0x7fffffff7370) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactory_def.hpp:213
#6  0x00007fffed3e8390 in NOX::Thyra::Group::updateLOWS (this=0x5665ea0) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-thyra/NOX_Thyra_Group.C:939
#7  0x00007fffed3e8d30 in NOX::Thyra::Group::applyJacobianInverseMultiVector (this=0x19141c0, p=..., input=..., result=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-thyra/NOX_Thyra_Group.C:784
#8  0x00007fffed3e9eb4 in NOX::Thyra::Group::applyJacobianInverseMultiVector (this=0x19141c0, p=..., input=..., result=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-thyra/NOX_Thyra_Group.C:680
#9  0x00007fffedc3c75d in LOCA::BorderedSolver::LowerTriangularBlockElimination::solve (this=0x5665ea0, params=..., op=..., B=..., C=..., F=0x18fbb50, G=0x18fbb80, X=..., Y=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_BorderedSolver_LowerTriangularBlockElimination.C:100
#10 0x00007fffedc44a03 in LOCA::BorderedSolver::Bordering::applyInverse (this=0x18e90d0, params=..., F=0x7fffd0f9e470 <std::runtime_error::~runtime_error()>, G=0x18fbb80, X=..., Y=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_BorderedSolver_Bordering.C:208
#11 0x00007fffedc6337c in LOCA::MultiContinuation::ConstrainedGroup::applyJacobianInverseMultiVector (this=0x18ca260, params=..., input=..., result=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_MultiContinuation_ConstrainedGroup.C:674
#12 0x00007fffedc67eeb in LOCA::MultiContinuation::ConstrainedGroup::computeNewton (this=0x5665ea0, params=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_MultiContinuation_ConstrainedGroup.C:481
#13 0x00007fffed36b26a in NOX::Direction::Newton::compute (this=0x190fe50, dir=..., soln=..., solver=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src/NOX_Direction_Newton.C:136
#14 0x00007fffed384838 in NOX::Solver::LineSearchBased::step (this=0x18e9260) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src/NOX_Solver_LineSearchBased.C:194
#15 0x00007fffed385df9 in NOX::Solver::LineSearchBased::solve (this=0x18e9260) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src/NOX_Solver_LineSearchBased.C:260
#16 0x00007fffedd44875 in LOCA::Stepper::start (this=0x1920610) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_Stepper.C:375
#17 0x00007fffedc2600a in LOCA::Abstract::Iterator::run (this=0x1920610) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_Abstract_Iterator.C:122
#18 0x00007ffff0d9c4da in Piro::LOCASolver<double>::evalModelImpl (this=0x1868040, inArgs=..., outArgs=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/piro/src/Piro_LOCASolver_Def.hpp:197
#19 0x00007ffff6cd23af in Thyra::ModelEvaluatorDefaultBase<double>::evalModel (this=0x5665ea0, inArgs=..., outArgs=...)
    at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Thyra_ModelEvaluatorDefaultBase.hpp:691
#20 0x0000000000443fd0 in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (model=..., computeResponses=..., computeSensitivities=112, 
    responses=..., sensitivities=..., observer=...) at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:120
#21 0x0000000000444957 in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (model=..., solveParams=..., responses=..., sensitivities=...)
    at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:162
#22 0x00000000004253c5 in PerformSolveBase<double> (sensitivities=..., responses=..., solveParams=..., piroModel=...)
    at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:291
#23 PerformSolve<double> (sensitivities=..., responses=..., solveParams=..., piroModel=...) at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:230
#24 main (argc=2, argv=0x7fffffffa1f8) at /ascldap/users/mperego/Workspace/albany/sources/albany-src-orig/src/Main_Solve.cpp:144
rppawlo commented 5 years ago

@cgcgcg - I just rebuilt all of trilinos and empire from scratch without #6190 (with all trilinos and kokkos deprecated code off). Looks like DualFeed and MultiBlock are now passing but a ton of other tests are not. We will need #6190 to fix other tests, but it looks like it is not quite ready yet.

mperego commented 4 years ago

I also found a failing test in Albany (I had not enabled that module before) but it happens in Ifpack2 (@srajama1 @trilinos/ifpack2 ).:

p=0: *** Caught standard std::exception of type 'std::runtime_error' :

 /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2372:

 Throw number = 1

 Throw test that evaluated to true: (numInserted == Teuchos::OrdinalTraits<size_t>::invalid())

 Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertLocallIndicesImpl: There is not enough capacity to insert indices in to row 15. The upper bound on the number of entries in this row must be increased to accommodate one or more of the new indices.

gdb (where)

#0  __cxxabiv1::__cxa_throw (obj=0x5665ea0, tinfo=0x7fffd127aa70 <typeinfo for std::runtime_error>, dest=0x7fffd0f9e470 <std::runtime_error::~runtime_error()>)
    at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:76
#1  0x00007fffde8182b3 in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertLocalIndicesImpl(int, Teuchos::ArrayView<int const> const&, std::function<void (unsigned long, unsigned long, unsigned long)>) (this=0x7fffffff6740, myRow=15, indices=..., fun=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:2367
#2  0x00007fffde828434 in Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::insertLocalIndices (this=0x5666dd0, localRow=15, indices=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/tpetra/core/src/Tpetra_CrsGraph_def.hpp:3222
#3  0x00007fffe73f6826 in Ifpack2::IlukGraph<Tpetra::CrsGraph<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::initialize (this=0x5664dc0)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/ifpack2/src/Ifpack2_IlukGraph.hpp:478
#4  0x00007fffe73f845b in Ifpack2::RILUK<Tpetra::RowMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::initialize (this=0x56649f0)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/ifpack2/src/Ifpack2_RILUK_def.hpp:470
#5  0x00007fffe77a4c56 in Thyra::Ifpack2PreconditionerFactory<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::initializePrec (
    this=0x7fffffff73b0, fwdOpSrc=..., prec=0x7fffffff7370) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactory_def.hpp:213
#6  0x00007fffed3e8390 in NOX::Thyra::Group::updateLOWS (this=0x5665ea0) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-thyra/NOX_Thyra_Group.C:939
#7  0x00007fffed3e8d30 in NOX::Thyra::Group::applyJacobianInverseMultiVector (this=0x19141c0, p=..., input=..., result=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-thyra/NOX_Thyra_Group.C:784
#8  0x00007fffed3e9eb4 in NOX::Thyra::Group::applyJacobianInverseMultiVector (this=0x19141c0, p=..., input=..., result=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-thyra/NOX_Thyra_Group.C:680
#9  0x00007fffedc3c75d in LOCA::BorderedSolver::LowerTriangularBlockElimination::solve (this=0x5665ea0, params=..., op=..., B=..., C=..., F=0x18fbb50, G=0x18fbb80, X=..., Y=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_BorderedSolver_LowerTriangularBlockElimination.C:100
#10 0x00007fffedc44a03 in LOCA::BorderedSolver::Bordering::applyInverse (this=0x18e90d0, params=..., F=0x7fffd0f9e470 <std::runtime_error::~runtime_error()>, G=0x18fbb80, X=..., Y=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_BorderedSolver_Bordering.C:208
#11 0x00007fffedc6337c in LOCA::MultiContinuation::ConstrainedGroup::applyJacobianInverseMultiVector (this=0x18ca260, params=..., input=..., result=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_MultiContinuation_ConstrainedGroup.C:674
#12 0x00007fffedc67eeb in LOCA::MultiContinuation::ConstrainedGroup::computeNewton (this=0x5665ea0, params=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_MultiContinuation_ConstrainedGroup.C:481
#13 0x00007fffed36b26a in NOX::Direction::Newton::compute (this=0x190fe50, dir=..., soln=..., solver=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src/NOX_Direction_Newton.C:136
#14 0x00007fffed384838 in NOX::Solver::LineSearchBased::step (this=0x18e9260) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src/NOX_Solver_LineSearchBased.C:194
#15 0x00007fffed385df9 in NOX::Solver::LineSearchBased::solve (this=0x18e9260) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src/NOX_Solver_LineSearchBased.C:260
#16 0x00007fffedd44875 in LOCA::Stepper::start (this=0x1920610) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_Stepper.C:375
#17 0x00007fffedc2600a in LOCA::Abstract::Iterator::run (this=0x1920610) at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/nox/src-loca/src/LOCA_Abstract_Iterator.C:122
#18 0x00007ffff0d9c4da in Piro::LOCASolver<double>::evalModelImpl (this=0x1868040, inArgs=..., outArgs=...)
    at /ascldap/users/mperego/Workspace/trilinos/sources/trilinos-src-devel/packages/piro/src/Piro_LOCASolver_Def.hpp:197
#19 0x00007ffff6cd23af in Thyra::ModelEvaluatorDefaultBase<double>::evalModel (this=0x5665ea0, inArgs=..., outArgs=...)
    at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Thyra_ModelEvaluatorDefaultBase.hpp:691
#20 0x0000000000443fd0 in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (model=..., computeResponses=..., computeSensitivities=112, 
    responses=..., sensitivities=..., observer=...) at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:120
#21 0x0000000000444957 in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (model=..., solveParams=..., responses=..., sensitivities=...)
    at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:162
#22 0x00000000004253c5 in PerformSolveBase<double> (sensitivities=..., responses=..., solveParams=..., piroModel=...)
    at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:291
#23 PerformSolve<double> (sensitivities=..., responses=..., solveParams=..., piroModel=...) at /home/mperego/Workspace/trilinos/builds/devel-shared/install/include/Piro_PerformSolve_Def.hpp:230
#24 main (argc=2, argv=0x7fffffffa1f8) at /ascldap/users/mperego/Workspace/albany/sources/albany-src-orig/src/Main_Solve.cpp:144

@srajama1 can you look into this? It happens in serial, 1 MPI proc.

kddevin commented 4 years ago

I think @mperego 's issues are related to deprecation of dynamic profile in Tpetra. @trilinos/tpetra will take a look. #5602

kddevin commented 4 years ago

@mperego can you provide more detail about your run? E.g. what level of overlap are you using in the preconditioner? what is the structure of your matrix? etc. We'd like to build a test to reproduce the error.

mperego commented 4 years ago

@mperego can you provide more detail about your run? E.g. what level of overlap are you using in the preconditioner? what is the structure of your matrix? etc. We'd like to build a test to reproduce the error.

@kddevin I replied in issue #6309.

rppawlo commented 4 years ago

As of this weekend, empire is now using the non-deprecated code path for all trilinos packages. We can close this ticket once Albany issue above is worked out. Thanks @tjfulle !