trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 565 forks source link

Thyra/Ifpack2: segfault #535

Closed nschloe closed 8 years ago

nschloe commented 8 years ago

When using Thyra with Ifpack2, initializePrec segfaults.

MWE:

#include <Stratimikos_DefaultLinearSolverBuilder.hpp>
#include <Teuchos_DefaultComm.hpp>
#include <Tpetra_CrsMatrix.hpp>
#include <Thyra_Ifpack2PreconditionerFactory.hpp>
#include <Thyra_TpetraThyraWrappers.hpp>

Teuchos::RCP<const Tpetra::CrsMatrix<double,int,int>>
create_matrix(
    const Teuchos::RCP<const Teuchos::Comm<int>> & comm
    )
{
  const Tpetra::global_size_t numGlobalElements = 100;

  const int indexBase = 0;
  auto map = Teuchos::rcp(new Tpetra::Map<int,int>(
        numGlobalElements,
        indexBase,
        comm
        ));

  const size_t numMyElements = map->getNodeNumElements();
  auto myGlobalElements = map->getNodeElementList();

  auto A = Teuchos::rcp(new Tpetra::CrsMatrix<double,int,int>(map, 1));

  for (size_t i = 0; i < numMyElements; i++) {
    A->insertGlobalValues(
        myGlobalElements[i],
        Teuchos::tuple(myGlobalElements[i]),
        Teuchos::tuple(1.0 / (myGlobalElements[i] + 1))
        );
  }

  A->fillComplete();
  return A;
}

int main(int argc, char *argv[]) {
  Teuchos::GlobalMPISession session(&argc, &argv, NULL);
  auto out = Teuchos::VerboseObjectBase::getDefaultOStream();

  const auto comm = Teuchos::DefaultComm<int>::getComm();

  const auto A = create_matrix(comm);

  auto b = Tpetra::Vector<double,int,int>(A->getRangeMap());
  b.putScalar(1.0);

  auto x = Tpetra::Vector<double,int,int>(A->getDomainMap());
  x.putScalar(0.0);

  Stratimikos::DefaultLinearSolverBuilder builder;
  auto p = Teuchos::rcp(new Teuchos::ParameterList());
  builder.setParameterList(p);

  auto lowsFactory = builder.createLinearSolveStrategy("");
  lowsFactory->setVerbLevel(Teuchos::VERB_LOW);

  const Tpetra::Operator<double,int,int> & opA = *A;
  auto thyraA = Thyra::createConstLinearOp(Teuchos::rcpFromRef(opA)); // throws

  Teuchos::RCP<Thyra::PreconditionerFactoryBase<double>> factory = Teuchos::rcp(
        new Thyra::Ifpack2PreconditionerFactory<Tpetra::CrsMatrix<double,int,int>>()
        );

  const auto prec = factory->createPrec();
  Thyra::initializePrec(*factory, thyraA, prec.ptr()); // segfault!
}

Output:

[fuji:08728] *** Process received signal ***
[fuji:08728] Signal: Segmentation fault (11)
[fuji:08728] Signal code: Address not mapped (1)
[fuji:08728] Failing at address: 0x90
[fuji:08728] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7fe2ae8a23d0]
[fuji:08728] [ 1] /usr/lib/x86_64-linux-gnu/libtrilinos_ifpack2-adapters.so.12(_ZNKSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_N7Teuchos44StringIndexedOrderedValueObjectContainerBase12OrdinalIndexEESt10_Select1stISB_ESt4lessIS5_ESaISB_EE4findERS7_+0x12)[0x7fe2b1a24bb2]
[fuji:08728] [ 2] /usr/lib/x86_64-linux-gnu/libtrilinos_ifpack2-adapters.so.12(_ZNK7Teuchos13ParameterList3getINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEERKT_RKS7_+0x33)[0x7fe2b1a28c53]
[fuji:08728] [ 3] /usr/lib/x86_64-linux-gnu/libtrilinos_ifpack2-adapters.so.12(_ZNK5Thyra28Ifpack2PreconditionerFactoryIN6Tpetra9CrsMatrixIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS3_6SerialENS3_9HostSpaceEEELb0EEEE14initializePrecERKN7Teuchos3RCPIKNS_18LinearOpSourceBaseIdEEEEPNS_18PreconditionerBaseIdEENS_16ESupportSolveUseE+0x31e)[0x7fe2b1a2b60e]
[fuji:08728] [ 4] ./ifpack2test(_ZN5Thyra14initializePrecIdEEvRKNS_25PreconditionerFactoryBaseIT_EERKN7Teuchos3RCPIKNS_12LinearOpBaseIS2_EEEERKNS6_3PtrINS_18PreconditionerBaseIS2_EEEENS_16ESupportSolveUseE+0x81)[0x46222b]
[fuji:08728] [ 5] ./ifpack2test(main+0x6ff)[0x454f2c]
[fuji:08728] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fe2ae4e8830]
[fuji:08728] [ 7] ./ifpack2test(_start+0x29)[0x4541b9]
[fuji:08728] *** End of error message ***
Segmentation fault (core dumped)

Backtrace:

#0  0x00007ffff796ebb2 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex> > >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
   from /usr/lib/x86_64-linux-gnu/libtrilinos_ifpack2-adapters.so.12
#1  0x00007ffff7972c53 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const& Teuchos::ParameterList::get<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
   from /usr/lib/x86_64-linux-gnu/libtrilinos_ifpack2-adapters.so.12
#2  0x00007ffff797560e in Thyra::Ifpack2PreconditionerFactory<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >::initializePrec(Teuchos::RCP<Thyra::LinearOpSourceBase<double> const> const&, Thyra::PreconditionerBase<double>*, Thyra::ESupportSolveUse) const ()
   from /usr/lib/x86_64-linux-gnu/libtrilinos_ifpack2-adapters.so.12
#3  0x000000000046222b in Thyra::initializePrec<double> (precFactory=..., fwdOp=..., prec=..., 
    supportSolveUse=Thyra::SUPPORT_SOLVE_UNSPECIFIED) at /usr/include/trilinos/Thyra_PreconditionerFactoryHelpers.hpp:66
#4  0x0000000000454f2c in main (argc=1, argv=0x7fffffffdcf8) at /tmp/ifpack2/source/main.cpp:81
mhoemmen commented 8 years ago

Hey @nschloe , thanks for reporting this! btw does this code actually use Teuchos_RCPStdSharedPtrConversions.hpp?

nschloe commented 8 years ago

Yup, but not in this small example. :) I removed the superfluous #includes.

mhoemmen commented 8 years ago

@nschloe Is this a debug or release build? Is Teuchos_ENABLE_DEBUG ON?

nschloe commented 8 years ago

For the sake of completeness, here's my CMakeLists.txt:

CMAKE_MINIMUM_REQUIRED(VERSION 2.8.8)

PROJECT(Nosh CXX)

FIND_PACKAGE(Trilinos REQUIRED COMPONENTS Stratimikos Thyra Tpetra Ifpack2)

INCLUDE_DIRECTORIES(
  SYSTEM
  ${Trilinos_INCLUDE_DIRS}
  ${Trilinos_TPL_INCLUDE_DIRS}
)

SET(MY_EXECUTABLE ifpack2test)
ADD_EXECUTABLE(${MY_EXECUTABLE} main.cpp)
TARGET_LINK_LIBRARIES(
  ${MY_EXECUTABLE}
  ${Trilinos_LIBRARIES}
  )
set_property(TARGET ${MY_EXECUTABLE} PROPERTY CXX_STANDARD 11)

It's configured with

cmake \
    -DCMAKE_BUILD_TYPE:STRING=Debug \
    -DCMAKE_CXX_COMPILER:STRING=mpicxx \
    ../source/
mhoemmen commented 8 years ago

It may help to set Teuchos_ENABLE_DEBUG:BOOL=ON, for e.g., additional RCP checks.

nschloe commented 8 years ago

This is for the Trilinos build, right?

mhoemmen commented 8 years ago

This is for the Trilinos build, right?

Yes :)

nschloe commented 8 years ago

Tough, I'm getting internal compiler errors when enabling that option; see, e.g., here.

Can you reproduce the segfault with the above MWE?

mhoemmen commented 8 years ago

Hi @nschloe -- it builds for me. My recent commit added a test, stratimikos/test/test_issue_535.cpp. The test fails in both MPI_DEBUG and SERIAL_RELEASE builds. It looks innocuous to me, but I'm not a Stratimikos or Thyra developer, so I'm not sure if you have to call some registration function first before you're allowed to create Ifpack2 things. @bartlettroscoe , could you comment? Thanks!

mhoemmen commented 8 years ago

I'm looking at ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactorydef.hpp, line 163. It turns out that constParamList (and therefore paramList) is null at that point. Can't get pineapple juice from a turnip; can't get a string parameter from a null ParameterList. Let me compare against the Ifpack(1) factory to see what's going on. My guess is that this class never gets used in this way -- @bartlettroscoe , is @nschloe 's use case typical?

mhoemmen commented 8 years ago

@bartlettroscoe , when would the Ifpack2PreconditionerFactory's setParameterList method normally get called? As far as I can tell, that's the only place in which paramList_ could possibly get set. My printf debugging indicates that this method never gets called.

mhoemmen commented 8 years ago

I tried something: After line 163 of ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactorydef.hpp, if paramList is null, I set constParamList to the result of getValidParameters() instead. That makes the test pass.

I'm not committing this change until I hear back from @bartlettroscoe to make sure that this won't break something else.

bartlettroscoe commented 8 years ago

@mhoemmen and @nschloe,

I had never looked at this code before. It looks like the initial version was contributed by:

01d1b7a "Ifpack2: Added Thyra adapter code"
Author: Julien Cortial <jcortia@sandia.gov>
Date:   Wed May 29 14:58:10 2013 -0700 (3 years, 2 months ago)

A       packages/ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactory.cpp
A       packages/ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactory_decl.hpp
A       packages/ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactory_def.hpp

I will look at this a little and then comment back.

But note that in general you never want to commit a test that you know is broken. This breaks the CI build (stopping people from pushing) and just creates a lot of red on the dashboard (which teaches people to just ignore failures). We will discuss this at the next Trilinos Leaders Meeting.

bartlettroscoe commented 8 years ago

I tried something: After line 163 of ifpack2/adapters/thyra/Thyra_Ifpack2PreconditionerFactorydef.hpp, if paramList is null, I set constParamList to the result of getValidParameters() instead. That makes the test pass.

The Teuchos::ParameterListAcceptor interface says that the defaults for the parameters being set should be put into the non-const PL. Why was this not followed for this factory like for the others? Was there a reason for this or just an oversight?

Also, if the PL is not set, then it should not be read. The factory object (or any object) that is not given a PL should still do something logical even if no PL gets set. Does that make sense?

Otherwise, is there a reason why the Ifpack2 factory was not registered the Stratimikos::DefaultLinearSolverBuilder like Ifpack? That would make it show up automatically.

I have my other questions as I start to look over the Ifpack2 code but I don't think this issue ticket is the right place for those.

mhoemmen commented 8 years ago

The Teuchos::ParameterListAcceptor interface says that the defaults for the parameters being set should be put into the non-const PL. Why was this not followed for this factory like for the others? Was there a reason for this or just an oversight?

git blame says "Julien Cortial" wrote this class. I made a tiny change in 2013 and another tiny change in 2015. Those changes are not relevant to this issue.

The class looks a lot like the Ifpack(1) factory, so my guess is that it was just a copy and paste. Does the Ifpack(1) factory do the right thing? (It lives in stratimikos/adapters.)

bartlettroscoe commented 8 years ago

"Julien Cortial" wrote this class

I don't think I ever met Julien. Is he still around SNL?

Does the Ifpack(1) factory do the right thing? (It lives in stratimikos/adapters.)

Yes (but there are no automated tests to prove that). It does:

    if(paramList_.get()) {
      Teuchos::ParameterList
        &ifpackSettingsPL = paramList_->sublist(IfpackSettings_name);
      // Above will create new sublist if it does not exist!
      TEUCHOS_TEST_FOR_EXCEPT(0!=ifpack_precOp->SetParameters(ifpackSettingsPL));
      // Above, I have not idea how any error messages for a mistake will be
      // reported back to the user!

From there, it is up to the Ifpack code itself to deal with PLs correctly.

I guess for now, since the Ifpack2PreconditionerFactory reads a const list, the change you suggest [above]() would be fine for now. I will create new issue tickets for a) addressing issues with the Teuchos::ParameterListAcceptor documentation and b) integrating Ifpack2PreconditionerFactory into Stratimikos proper.

mhoemmen commented 8 years ago

@bartlettroscoe if (paramList_.get ()) just means it checks whether the current ParameterList is null. Should it be up to the specific Factory whether creating a solver without parameters makes sense?

Anyway, I'll make the suggested change. Thanks for looking at this!

bartlettroscoe commented 8 years ago

Should it be up to the specific Factory whether creating a solver without parameters makes sense?

That has to be specified in the abstract interface and then all of the implementations need to follow that. The assumed behavior for the Teuchos::ParameterListAcceptor interface should be that a PL is not required in order for the object to do something valid. We obviously need to write down those specs and then upgrade the underlying subclasses to follow. But this might break backward compatibility with non-complaint implementations so now we have a mess.

mhoemmen commented 8 years ago

I just pushed the proposed fix. I'll close this issue for now but everybody, please feel free to continue discussion. @bartlettroscoe will open other issues as needed.

The assumed behavior for the Teuchos::ParameterListAcceptor interface should be that a PL is not required in order for the object to do something valid. We obviously need to write down those specs and then upgrade the underlying subclasses to follow. But this might break backward compatibility with non-complaint implementations so now we have a mess.

I agree that this is a dilemma. On the one hand, parameters are "options" and therefore "optional." On the other hand, what's a "default preconditioner"? Users might forget to tell us which preconditioner they want, and it makes sense for us to catch that early. Otherwise, they get unexpected results and we have to waste time digging through their input deck.

nschloe commented 8 years ago

Thanks everyone for investigating!

bartlettroscoe commented 8 years ago

@nschloe,

Thanks everyone for investigating!

I think the bottom line from this discussion is that almost every realistic use case will require that a PL be registered with a factory object. We need to pin down the expected behavior better.

Also, I think that if you turn on debug-mode checking (e.g. -DTrilinos_ENABLE_DEBUG=ON) then this segfault will be replaced with a nice exception, including a stack trace if you have BinUtils enabled.

nschloe commented 8 years ago

That was Mark's tip too. I'll try to enable this in the Debian build. (A binutils bug will have to be resolved first.)

bartlettroscoe commented 8 years ago

A binutils bug will have to be resolved first

BinUtils is only needed if you want a stacktrace without running in a debugger. But if you build a full debug version of the code (e.g. -DCMAKE_BUILD_TYPE=DEBUG) then you can run your code in a debugger and set a breakpoint when the throw occurs. See documentation for TEUCHOS_TEST_FOR_EXCEPTION()). In this case, you don't really need BinUtils. But for non-deterministic errors, the stacktrace produced but BinUtils is super useful.