sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

New FPEs in Albany as of today #422

Closed ikalash closed 5 years ago

ikalash commented 5 years ago

There are a lot of new failures starting today due to FPEs:

Since there have been no non-trivial commits to Albany, it seems the problem came from Trilinos.

rppawlo commented 5 years ago

It was this:

https://github.com/trilinos/Trilinos/pull/4158

bathmatt commented 5 years ago

This change was setting Teuchos::float_nan to be a signaling nan vs 0./0.

Somewhere in albany you are probably using a NaN and it is propagating?

Mark Hoemmen suggesting using a quiet_nan instead. I don't like this because if you have

x = quiet_NaN();

y= sqrt(x);

That doesn't throw an exception.

Do you have exception handling turned on for any run??

Can you compile with

include

and this in your main?

feenableexcept( FE_DIVBYZERO | FE_INVALID  );

and run the code and see where it excepts?

ikalash commented 5 years ago

@bathmatt thanks for the explanation. Several of the builds where the problem cropped up are debug builds so I can just use gdb to point to the line number where the problem runs. It looks like the nans crop up in different places depending on the problem, all deep within Trilinos. For example, for the Helmholtz2D_Tpetra test case (http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4229701&build=80209 ) here is the backtrace:

#0  0x00007ffff564e658 in Tpetra::Details::Blas::Impl::Fill<Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Serial, void>, double, Kokkos::Serial, int, 2>::fill (X=..., alpha=@0x7fffffff9a50: nan(0x4000000000000), numRows=2, numCols=1)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Tpetra_Details_fill.hpp:246
#1  0x00007ffff560f4c8 in Tpetra::Details::Blas::fill<Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Serial, void>, double, int, Kokkos::Serial> (execSpace=..., X=..., alpha=@0x7fffffff9a50: nan(0x4000000000000), numRows=2, numCols=1)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Tpetra_Details_fill.hpp:297
#2  0x00007ffff55df9a1 in Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::putScalar (this=0xe9c8f0, alpha=@0x7fffffff9b30: nan(0x4000000000000))
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Tpetra_MultiVector_def.hpp:2611
#3  0x00007ffff5cf88dd in Thyra::TpetraVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::assignImpl (this=0xdd9330, alpha=nan(0x4000000000000))
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_TpetraVector_def.hpp:377
#4  0x00007ffff5615357 in Thyra::MultiVectorBase<double>::assign (this=0xdd9388, alpha=nan(0x4000000000000))
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_MultiVectorBase_decl.hpp:514
#5  0x00007ffff56fc29e in Thyra::put_scalar<double> (alpha=@0x7fffffff9c60: nan(0x4000000000000), v_lhs=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_VectorStdOps_def.hpp:170
#6  0x00007ffff5858651 in Thyra::assign<double> (v_lhs=..., alpha=@0x7fffffff9c60: nan(0x4000000000000))
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_VectorStdOps_def.hpp:358
#7  0x00007fffe49ae4d6 in Thyra::ModelEvaluatorBase::OutArgs<double>::setFailed (this=0x7fffffffa480)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/thyra/core/src/interfaces/nonlinear/model_evaluator/fundamental/Thyra_ModelEvaluatorBase_def.hpp:1444
#8  0x00007fffe49ad4d0 in Piro::LOCASolver<double>::evalModelImpl (this=0xdd9850, inArgs=..., outArgs=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/piro/src/Piro_LOCASolver_Def.hpp:205
#9  0x00007ffff593696d in Thyra::ModelEvaluatorDefaultBase<double>::evalModel (this=0xdd99e0, inArgs=..., outArgs=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_ModelEvaluatorDefaultBase.hpp:685
#10 0x00000000006491d1 in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (
    model=..., computeResponses=..., computeSensitivities=false, responses=..., sensitivities=..., observer=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:120
#11 0x000000000062a0ae in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (
    model=..., solveParams=..., responses=..., sensitivities=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:162
#12 0x000000000061760e in Piro::PerformSolveBase<double> (piroModel=..., solveParams=..., responses=..., sensitivities=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:291
--Type <RET> for more, q to quit, c to continue without paging--
#13 0x000000000060b226 in Piro::PerformSolve<double> (piroModel=..., solveParams=..., responses=..., sensitivities=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:230
#14 0x00000000005f3df8 in main (argc=2, argv=0x7fffffffd268) at /home/ikalash/nightlyCDash/repos/Albany/src/Main_SolveT.cpp:335

Thyra starts to encounter nans doing a LOCA continuation. For another test, StaticElasticity3D (http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4229199&build=80203 ) the issue is in MueLu:

(gdb) bt
#0  0x00007ffff558dadf in std::isnan (__x=nan(0x4000000000000)) at /usr/include/c++/8/cmath:620
#1  0x00007ffff55ddf79 in Teuchos::generic_real_isnaninf<double> (x=@0x7fffffff1898: nan(0x4000000000000))
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Teuchos_ScalarTraits.hpp:126
#2  0x00007ffff5590e47 in Teuchos::ScalarTraits<double>::isnaninf (x=nan(0x4000000000000))
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Teuchos_ScalarTraits.hpp:738
#3  0x00007ffff60f06f5 in Ifpack2::Details::Chebyshev<double, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::compute (this=0x18fab08)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Ifpack2_Details_Chebyshev_def.hpp:863
#4  0x00007ffff6027017 in Ifpack2::Chebyshev<Tpetra::RowMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::compute (this=0x18fab00)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Ifpack2_Chebyshev_def.hpp:320
#5  0x00007ffff60d8b69 in MueLu::Ifpack2Smoother<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::SetupGeneric (this=0x1846830, currentLevel=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_Ifpack2Smoother_def.hpp:639
#6  0x00007ffff5ff4478 in MueLu::Ifpack2Smoother<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Setup (this=0x1846830, currentLevel=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_Ifpack2Smoother_def.hpp:196
#7  0x00007ffff5ff9b50 in MueLu::TrilinosSmoother<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Setup (this=0x1a1c080, currentLevel=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_TrilinosSmoother_def.hpp:177
#8  0x00007ffff602a291 in MueLu::SmootherFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::BuildSmoother (this=0x1a0f840, currentLevel=..., preOrPost=MueLu::BOTH)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_SmootherFactory_def.hpp:163
#9  0x00007ffff6029f96 in MueLu::SmootherFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Build (this=0x1a0f840, currentLevel=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_SmootherFactory_def.hpp:122
#10 0x00007ffff55d74b8 in MueLu::SingleLevelFactoryBase::CallBuild (this=0x1a0f840, requestedLevel=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_SingleLevelFactoryBase.hpp:133
#11 0x00007ffff5d95b48 in MueLu::Level::Get<Teuchos::RCP<MueLu::SmootherBase<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > > (this=0xaf72d0, ename="PreSmoother", factory=0x1a0f840)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_Level.hpp:203
#12 0x00007ffff5d943e3 in MueLu::TopSmootherFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Build (this=0x1830b10, level=...)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_TopSmootherFactory_def.hpp:98
#13 0x00007ffff5d5dc7b in MueLu::Hierarchy<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::Setup (this=0x1857060, coarseLevelID=0, fineLevelManager=..., coarseLevelManager=..., nextLevelManager=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_Hierarchy_def.hpp:450
#14 0x00007ffff5d5b4ab in MueLu::HierarchyManager<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::SetupHierarchy (this=0xab2820, H=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_HierarchyManager.hpp:236
#15 0x00007ffff606dcc3 in MueLu::ParameterListInterpreter<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::SetupHierarchy (this=0xab2820, H=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_ParameterListInterpreter_def.hpp:2171
#16 0x00007ffff5d227d9 in MueLu::CreateXpetraPreconditioner<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > (op=..., inParamList=..., coords=..., nullspace=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/MueLu_CreateXpetraPreconditioner.hpp:103
#17 0x00007ffff5cd6205 in Thyra::MueLuPreconditionerFactory<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::initializePrec (this=0x18673b0, fwdOpSrc=..., prec=0x184cd40, supportSolveUse=Thyra::SUPPORT_SOLVE_UNSPECIFIED)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_MueLuPreconditionerFactory_def.hpp:209
#18 0x00007fffe000f059 in NOX::Thyra::Group::updateLOWS (this=0x1849c80)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C:929
#19 0x00007fffe000d6e3 in NOX::Thyra::Group::applyJacobianInverseMultiVector (this=0x1849c80, p=..., input=..., result=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C:774
#20 0x00007fffe000d061 in NOX::Thyra::Group::applyJacobianInverse (this=0x1849c80, p=..., input=..., result=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C:654
#21 0x00007fffe000c282 in NOX::Thyra::Group::computeNewton (this=0x1849c80, p=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C:521
#22 0x00007fffdff3a914 in NOX::Direction::Newton::compute (this=0x1904660, dir=..., soln=..., solver=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src/NOX_Direction_Newton.C:136
#23 0x00007fffdff2f804 in NOX::Direction::Generic::compute (this=0x1904660, d=..., g=..., s=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src/NOX_Direction_Generic.C:59
#24 0x00007fffdff3ad86 in NOX::Direction::Newton::compute (this=0x1904660, dir=..., soln=..., solver=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src/NOX_Direction_Newton.C:164
#25 0x00007fffdff6d2fa in NOX::Solver::LineSearchBased::step (this=0xab7920)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src/NOX_Solver_LineSearchBased.C:194
#26 0x00007fffdff6d90c in NOX::Solver::LineSearchBased::solve (this=0xab7920)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src/NOX_Solver_LineSearchBased.C:260
#27 0x00007fffe003188e in Thyra::NOXNonlinearSolver::solve (this=0x1902f50, x=0x183dbd8, delta=0x0)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/nox/src-thyra/Thyra_NonlinearSolver_NOX.cpp:235
#28 0x00007fffe4941a41 in Piro::NOXSolver<double>::evalModelImpl (this=0x1836320, inArgs=..., outArgs=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/packages/piro/src/Piro_NOXSolver_Def.hpp:175
#29 0x00007ffff593696d in Thyra::ModelEvaluatorDefaultBase<double>::evalModel (this=0x18363c8, inArgs=..., outArgs=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Thyra_ModelEvaluatorDefaultBase.hpp:685
#30 0x00000000006491d1 in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (
    model=..., computeResponses=..., computeSensitivities=false, responses=..., sensitivities=..., observer=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:120
#31 0x000000000062a0ae in Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const> (
    model=..., solveParams=..., responses=..., sensitivities=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:162
#32 0x000000000061760e in Piro::PerformSolveBase<double> (piroModel=..., solveParams=..., responses=..., sensitivities=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:291
#33 0x000000000060b226 in Piro::PerformSolve<double> (piroModel=..., solveParams=..., responses=..., sensitivities=...)
    at /home/ikalash/nightlyAlbanyTests/Results/Trilinos/build-debug/install/include/Piro_PerformSolve_Def.hpp:230
#34 0x00000000005f3df8 in main (argc=2, argv=0x7fffffffd1f8) at /home/ikalash/nightlyCDash/repos/Albany/src/Main_SolveT.cpp:335

I am not sure how to proceed in fixing this... I actually spent about a week chasing down / fixing FPEs in Albany before the break. I don't have the time right now to do this again, and it would be much harder b/c the FPEs seem to be happening deep in Trilinos, not on the Albany side; at the same time we need the nightlies to be clean as we are in the midst of a refactor...

bathmatt commented 5 years ago

Can you try with a quiet_NaN instead of signaling NaN?? I think this is showing bugs in the code that are going to be ignored. Mark Hoemmen might have an idea.

my guess, just looking at the trilinos is that it might be coming in through the model evaluator and you're not defining all the things in there.

inArgs.set_alpha(Teuchos::ScalarTraits<double>::nan()); // make sure these don't percolate through!
inArgs.set_beta(Teuchos::ScalarTraits<double>::nan());  // make sure these don't percolate through!
inArgs.set_alpha(Teuchos::ScalarTraits<double>::nan()); // make sure these don't percolate through!
inArgs.set_beta(Teuchos::ScalarTraits<double>::nan());  // make sure these don't percolate through!

// diagonal will be slightly diag dominant.

Not sure on this, if you change it to quiet_NaN in scalar traits cpp file you can compute with nan and not throw a signal.

Maybe a configure option for teuchos?

bathmatt commented 5 years ago

@krcb

bathmatt commented 5 years ago

@krcb tracked this down, we know what's causing it, thinking there is a good fix for this and a bad fix for this..

ikalash commented 5 years ago

Thanks for looking into this @bathmatt . Regarding your comment about defining inArgs.set_alpha(Teuchos::ScalarTraits<double>::nan()); and changing it to a quiet_NaN: I am not sure I follow. We don't set anything in the model evaluator to Teuchos::ScalarTraits::nan() currently. Are you saying that, taking alpha as a concrete example, if we don't have 'inArgs.set_alpha(value)' it's getting set to a nan somewhere within Trilinos causing the problem?

bathmatt commented 5 years ago

Here is the problem, people have initialized values to nan, and then check to see if it is nan.. If that nan isn't quiet it will throw if you

std::isnan(value).

Ifpack2 did this, it sets things to NaN as a flag. That is bad programming...

THe problem is that if you say, I'm going to initialize things to NaN and then make sure they get set elsewhere, and you use a quiet_nan you have issues of not catching bad usage. This is a more proper usage.

THe good fix is to fix the code so have nan_as_a_flag() which returns a quiet nan and nan returns a signaling nan..

This is in ifpack2 and probably other places but not sure..

ikalash commented 5 years ago

@bathmatt yes, I followed that part. It seems the issues are in Trilinos (e.g., Ifpack2) not in Albany . I don't think we are initializing things to nans in Albany - this was part of a cleanup done some time ago I believe (of course something could have been missed). It seems our best path forward for now is to wait for this to get fixed in Trilinos where it shows up (e.g., Ifpack2) then re-evaluate any FPEs that are still present and try to determine if they are due to another Trilinos package or something in Albany.

bathmatt commented 5 years ago

@bartlettroscoe @mhoemmen What are your thoughts on this, the problem is that trilinos mixes uses of NaNs for both wanting quiet and signaling, but teuchos supports only one. I think TEuchos should have a quiet_nan function and ifpack2/tpetra and others should be fixed

WHat are your thoughts?

ikalash commented 5 years ago

I would think this issue would affect other codes that check for FPEs in nightly testing and use Trilinos, e.g., SPARC, Drekar, etc. Not sure if that has been encountered or not.

rppawlo commented 5 years ago

Just build drekar a few minutes ago against trilinos develop - amazingly not seeing any issues here.

bathmatt commented 5 years ago

THat's becase you don't turn on exception handling... You're quietly throwing everywhere...

bathmatt commented 5 years ago

Just a guess.....

ikalash commented 5 years ago

@rppawlo : do you have FPE checking on? One interesting thing I've found while tracking down FPEs in Albany a few weeks ago is some compilers catch more of them than others. Clang 7 in particular caught a lot that other compilers miss even with FPE checking on.

rppawlo commented 5 years ago

that is true - I don't think we enable that.

ikalash commented 5 years ago

@rppawlo that will probably do it then. The Albany failures are only showing up in builds with FPE checking enabled.

bartlettroscoe commented 5 years ago

@bathmatt wrote:

@bartlettroscoe @mhoemmen What are your thoughts on this, the problem is that trilinos mixes uses of NaNs for both wanting quiet and signaling, but teuchos supports only one. I think TEuchos should have a quiet_nan function and ifpack2/tpetra and others should be fixed

Portable NaN manipulation was very hard 10+ years ago when this Teuchos code was written. It is possible that modern C++11 has better standard support for quiet and signalling NaNs (generating them and detecting them). Some one should do a portability study to see if we can go with standard C++11 functionality for this. The ATDM Trilinos builds with some tweaks would be a great place to try this out.

bathmatt commented 5 years ago

Well, I don't know how what was in teuchos worked in a portable way.

It took +0./0. and set it to T::nan.

Then intrepid2 set initial values to T::nan..

Then it did an isnan(value) to see if it is the first time it is used.

isnan throws if value is a nan. How did this work.. is 0./0. a quiet nan everywhere.

mhoemmen commented 5 years ago

@bathmatt It's possible to get harmless intermediate Infs and NaNs in some algorithms. See, e.g.,

That's why we don't turn on signaling NaNs by default. My approval of your PR should not be construed as approval of turning quiet NaNs into signaling NaNs. In fact, I distinctly recall looking for a way to make quiet NaNs in Teuchos::ScalarTraits, when editing that code.

I would say, let's revert the PR. You can try it again with numeric_limits::quiet_NaN().

bathmatt commented 5 years ago

I understand that one can get harmless nan's in some calculations, will that throw a FPE?? My feeling if it does one should cache the exception handling flags, do the math and reenable.

FPE is very useful for algorithms to have on to catch errors in my code or invalid things passed into libraries. I think that Tuechos:nan shouldn't exist but it should be quiet and signaling nan and that if one wants to use it as a flag as is done in ifpack2, it should be quiet, and if it is an initialization thing where one wants to make sure you set before use it should be signaling...

I'll make a PR for changing signaling to quiet and and go from there.

One question I have, why was +0/0 not a signaling nan? That is the old calc to do nan initializatio

second question, should trilinos test with FPE on?

https://github.com/trilinos/Trilinos/pull/4180

ikalash commented 5 years ago

I would push for Trilinos testing with FPEs on. Albany does b/c I agree that it is useful. It is somewhat frustrating that we now have a bunch of failures throughout our dashboard due to FPEs on the Trilinos side...

mhoemmen commented 5 years ago

@bathmatt wrote:

One question I have, why was +0/0 not a signaling nan?

If it's a constexpr, it could be evaluated at compile time.

@ikalash Note that in debug mode, Tpetra::MultiVector's constructor uses Kokkos::ArithTraits to fill its entries with NaNs when zeroOut is false. Kokkos::ArithTraits uses quiet NaNs (see https://github.com/kokkos/kokkos-kernels/issues/35 ). Thus, I'm hopeful that this change will make the behavior of different parts of Trilinos more consistent.

ikalash commented 5 years ago

Looks like this has been resolved on the Trilinos side - thanks to the Trilinos folks for doing that!