Closed marcfehling closed 1 year ago
git bisect
will help to find the faulty commit for the poisson_trilinos
case (and poisson_petsc
as well).
Why does stokes_trilinos
pass but stokes_petsc
fail?
git bisect
reports:
6d9bf74f794689fdf0e5838759f12c3d38a8b919 is the first bad commit
commit 6d9bf74f794689fdf0e5838759f12c3d38a8b919
Author: Marc Fehling <mafehling.git@gmail.com>
Date: Tue Nov 15 15:37:13 2022 -0700
Use update ghost values after operator=.
:040000 040000 de855d033f10d6a16eb76244f6e7110b2c4f3a03 30dbfd1e7d516345847056c6ab3cfee880e5c963 M source
The original issues have been fixed in fix_bugs.
A new issue came up, that only sometimes triggers this assertion:
hprun_stokes_dealiitrilinos
===========================
6: --------------------------------------------------------
6: An error occurred in line <1850> of file </raid/fehling/dealii/include/deal.II/lac/la_parallel_vector.templates.h> in function
6: dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::real_type dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::norm_sqr_local() const [with Number = double; MemorySpace = dealii::MemorySpace::Host; dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::real_type = double]
6: The violated condition was:
6: dealii::numbers::is_finite(sum)
6: Additional information:
6: In a significant number of places, deal.II checks that some
6: intermediate value is a finite number (as opposed to plus or minus
6: infinity, or NaN/Not a Number). In the current function, we
6: encountered a number that is not finite (its value is (inf,0) and
6: therefore violates the current assertion).
6:
6: This may be due to the fact that some operation in this function
6: created such a value, or because one of the arguments you passed to
6: the function already had this value from some previous operation. In
6: the latter case, this function only triggered the error but may not
6: actually be responsible for the computation of the number that is not
6: finite.
6:
6: There are two common cases where this situation happens. First, your
6: code (or something in deal.II) divides by zero in a place where this
6: should not happen. Or, you are trying to solve a linear system with an
6: unsuitable solver (such as an indefinite or non-symmetric linear
6: system using a Conjugate Gradient solver); such attempts oftentimes
6: yield an operation somewhere that tries to divide by zero or take the
6: square root of a negative value.
6:
6: In any case, when trying to find the source of the error, recall that
6: the location where you are getting this error is simply the first
6: place in the program where there is a check that a number (e.g., an
6: element of a solution vector) is in fact finite, but that the actual
6: error that computed the number may have happened far earlier. To find
6: this location, you may want to add checks for finiteness in places of
6: your program visited before the place where this error is produced.
6: One way to check for finiteness is to use the 'AssertIsFinite' macro.
6:
6: Stacktrace:
6: -----------
6: #0 /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::norm_sqr_local() const
6: #1 /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::norm_sqr() const
6: #2 /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::l2_norm() const
6: #3 hprun: dealii::internal::SolverCG::IterationWorkerBase<dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::TrilinosWrappers::SparseMatrix, dealii::TrilinosWrappers::PreconditionJacobi>::startup(dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&)
6: #4 hprun: void dealii::SolverCG<dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >::solve<dealii::TrilinosWrappers::SparseMatrix, dealii::TrilinosWrappers::PreconditionJacobi>(dealii::TrilinosWrappers::SparseMatrix const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, dealii::TrilinosWrappers::PreconditionJacobi const&)
6: #5 hprun: LinearSolvers::BlockSchurPreconditioner<dealiiTrilinos>::vmult(dealii::LinearAlgebra::distributed::BlockVector<double>&, dealii::LinearAlgebra::distributed::BlockVector<double> const&) const
6: #6 hprun: void dealii::SolverFGMRES<dealii::LinearAlgebra::distributed::BlockVector<double> >::solve<dealii::TrilinosWrappers::BlockSparseMatrix, LinearSolvers::BlockSchurPreconditioner<dealiiTrilinos> >(dealii::TrilinosWrappers::BlockSparseMatrix const&, dealii::LinearAlgebra::distributed::BlockVector<double>&, dealii::LinearAlgebra::distributed::BlockVector<double> const&, LinearSolvers::BlockSchurPreconditioner<dealiiTrilinos> const&)
6: #7 hprun: Stokes::Problem<2, dealiiTrilinos, 2>::solve()
6: #8 hprun: Stokes::Problem<2, dealiiTrilinos, 2>::run()
6: #9 hprun: main
6: --------------------------------------------------------
Fixed by explicit initialization similar to https://github.com/geodynamics/aspect/pull/4973.
https://github.com/marcfehling/hpbox/commit/3a9383ac7875903f590656277db6b6a64a671b92 introduced tests, but some of these tests are failing in Debug mode currently and even in the 0.1 release.
Tests that fail on
master
and0.1
:Additional tests that fail only on
master
Remember: Start testing early...