marcfehling / hpbox

Sandbox for hp-adaptive methods
GNU General Public License v3.0
2 stars 1 forks source link

Failing tests #4

Closed marcfehling closed 1 year ago

marcfehling commented 1 year ago

https://github.com/marcfehling/hpbox/commit/3a9383ac7875903f590656277db6b6a64a671b92 introduced tests, but some of these tests are failing in Debug mode currently and even in the 0.1 release.

Tests that fail on master and 0.1:

hprun_poisson_dealiitrilinos
============================
1: --------------------------------------------------------
1: An error occurred in line <1675> of file </raid/fehling/bin/dealii-9.5.0-pre/include/deal.II/lac/la_parallel_vector.h> in function
1:     Number& dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::operator()(dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::size_type) [with Number = double; MemorySpace = dealii::MemorySpace::Host; dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::size_type = unsigned int]
1: The violated condition was: 
1:     partitioner->in_local_range(global_index) || partitioner->ghost_indices().is_element(global_index)
1: Additional information: 
1:     You tried to access element 4936 of a distributed vector, but this
1:     element is not stored on the current processor. Note: The range of
1:     locally owned elements is [6305,12544], and there are 0 ghost elements
1:     that this vector can access.
1:     
1:     A common source for this kind of problem is that you are passing a
1:     'fully distributed' vector into a function that needs read access to
1:     vector elements that correspond to degrees of freedom on ghost cells
1:     (or at least to 'locally active' degrees of freedom that are not also
1:     'locally owned'). You need to pass a vector that has these elements as
1:     ghost entries.
1: 
1: Stacktrace:
1: -----------
1: #0  hprun: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::operator()(unsigned int)
1: #1  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::add(std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<double, std::allocator<double> > const&)
1: #2  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: void dealii::AffineConstraints<double>::distribute_local_to_global<dealii::TrilinosWrappers::SparseMatrix, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >(dealii::FullMatrix<double> const&, dealii::Vector<double> const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, dealii::TrilinosWrappers::SparseMatrix&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, bool, std::integral_constant<bool, false>) const
1: #3  hprun: void dealii::AffineConstraints<double>::distribute_local_to_global<dealii::TrilinosWrappers::SparseMatrix, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >(dealii::FullMatrix<double> const&, dealii::Vector<double> const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, dealii::TrilinosWrappers::SparseMatrix&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, bool) const
1: #4  hprun: Poisson::OperatorMatrixBased<2, dealiiTrilinos, 2>::reinit(dealii::DoFHandler<2, 2> const&, dealii::AffineConstraints<double> const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&)
1: #5  hprun: Poisson::Problem<2, dealiiTrilinos, 2>::run()
1: #6  hprun: main
1: --------------------------------------------------------

Additional tests that fail only on master

hprun_poisson_petsc
===================
4: --------------------------------------------------------
4: An error occurred in line <906> of file </raid/fehling/bin/dealii-9.5.0-pre/include/deal.II/lac/petsc_vector_base.h> in function
4:     const dealii::PETScWrappers::internal::VectorReference& dealii::PETScWrappers::internal::VectorReference::operator=(const PetscScalar&) const
4: The violated condition was: 
4:     !vector.has_ghost_elements()
4: Additional information: 
4:     You are trying an operation on a vector that is only allowed if the
4:     vector has no ghost elements, but the vector you are operating on does
4:     have ghost elements. Specifically, vectors with ghost elements are
4:     read-only and cannot appear in operations that write into these
4:     vectors.
4:     
4:     See the glossary entry on 'Ghosted vectors' for more information.
4: 
4: Stacktrace:
4: -----------
4: #0  hprun: dealii::PETScWrappers::internal::VectorReference::operator=(double const&) const
4: #1  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::internal::ElementAccess<dealii::PETScWrappers::MPI::Vector>::set(double, unsigned int, dealii::PETScWrappers::MPI::Vector&)
4: #2  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: void dealii::AffineConstraints<double>::distribute<dealii::PETScWrappers::MPI::Vector>(dealii::PETScWrappers::MPI::Vector&) const
4: #3  hprun: Poisson::Problem<2, PETSc, 2>::solve()
4: #4  hprun: Poisson::Problem<2, PETSc, 2>::run()
4: #5  hprun: main
4: --------------------------------------------------------
hprun_poisson_trilinos
======================
5: --------------------------------------------------------
5: An error occurred in line <2502> of file </raid/fehling/dealii/include/deal.II/lac/affine_constraints.templates.h> in function
5:     void dealii::internal::import_vector_with_ghost_elements(const dealii::TrilinosWrappers::MPI::Vector&, const dealii::IndexSet&, const dealii::IndexSet&, dealii::TrilinosWrappers::MPI::Vector&, std::integral_constant<bool, false>)
5: The violated condition was: 
5:     !vec.has_ghost_elements()
5: Additional information: 
5:     You are trying an operation on a vector that is only allowed if the
5:     vector has no ghost elements, but the vector you are operating on does
5:     have ghost elements. Specifically, vectors with ghost elements are
5:     read-only and cannot appear in operations that write into these
5:     vectors.
5:     
5:     See the glossary entry on 'Ghosted vectors' for more information.
5: 
5: Stacktrace:
5: -----------
5: #0  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::internal::import_vector_with_ghost_elements(dealii::TrilinosWrappers::MPI::Vector const&, dealii::IndexSet const&, dealii::IndexSet const&, dealii::TrilinosWrappers::MPI::Vector&, std::integral_constant<bool, false>)
5: #1  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: void dealii::AffineConstraints<double>::distribute<dealii::TrilinosWrappers::MPI::Vector>(dealii::TrilinosWrappers::MPI::Vector&) const
5: #2  hprun: Poisson::Problem<2, Trilinos, 2>::solve()
5: #3  hprun: Poisson::Problem<2, Trilinos, 2>::run()
5: #4  hprun: main
5: --------------------------------------------------------
hprun_stokes_dealiitrilinos
===========================
6: --------------------------------------------------------
6: An error occurred in line <1675> of file </raid/fehling/bin/dealii-9.5.0-pre/include/deal.II/lac/la_parallel_vector.h> in function
6:     Number& dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::operator()(dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::size_type) [with Number = double; MemorySpace = dealii::MemorySpace::Host; dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::size_type = unsigned int]
6: The violated condition was: 
6:     partitioner->in_local_range(global_index) || partitioner->ghost_indices().is_element(global_index)
6: Additional information: 
6:     You tried to access element 3250 of a distributed vector, but this
6:     element is not stored on the current processor. Note: The range of
6:     locally owned elements is [9506,18817], and there are 0 ghost elements
6:     that this vector can access.
6:     
6:     A common source for this kind of problem is that you are passing a
6:     'fully distributed' vector into a function that needs read access to
6:     vector elements that correspond to degrees of freedom on ghost cells
6:     (or at least to 'locally active' degrees of freedom that are not also
6:     'locally owned'). You need to pass a vector that has these elements as
6:     ghost entries.
6: 
6: Stacktrace:
6: -----------
6: #0  hprun: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::operator()(unsigned int)
6: #1  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::BlockVectorBase<dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >::operator()(unsigned int)
6: #2  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: void dealii::AffineConstraints<double>::distribute_local_to_global<dealii::TrilinosWrappers::BlockSparseMatrix, dealii::LinearAlgebra::distributed::BlockVector<double> >(dealii::FullMatrix<double> const&, dealii::Vector<double> const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, dealii::TrilinosWrappers::BlockSparseMatrix&, dealii::LinearAlgebra::distributed::BlockVector<double>&, bool, std::integral_constant<bool, true>) const
6: #3  hprun: void dealii::AffineConstraints<double>::distribute_local_to_global<dealii::TrilinosWrappers::BlockSparseMatrix, dealii::LinearAlgebra::distributed::BlockVector<double> >(dealii::FullMatrix<double> const&, dealii::Vector<double> const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, dealii::TrilinosWrappers::BlockSparseMatrix&, dealii::LinearAlgebra::distributed::BlockVector<double>&, bool) const
6: #4  hprun: Stokes::Problem<2, dealiiTrilinos, 2>::assemble_system()
6: #5  hprun: Stokes::Problem<2, dealiiTrilinos, 2>::run()
6: #6  hprun: main
6: --------------------------------------------------------
hprun_stokes_petsc
==================
7: --------------------------------------------------------
7: An error occurred in line <1053> of file </raid/fehling/dealii/source/base/mpi.cc> in function
7:     std::vector<unsigned int> dealii::Utilities::MPI::compute_index_owner(const dealii::IndexSet&, const dealii::IndexSet&, ompi_communicator_t* const&)
7: The violated condition was: 
7:     owned_indices.size() == Utilities::MPI::max(owned_indices.size(), comm)
7: Additional information: 
7:     IndexSets have to have the same size on all processes.
7: 
7: Stacktrace:
7: -----------
7: #0  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::Utilities::MPI::compute_index_owner(dealii::IndexSet const&, dealii::IndexSet const&, ompi_communicator_t* const&)
7: #1  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::SparsityTools::distribute_sparsity_pattern(dealii::BlockDynamicSparsityPattern&, dealii::IndexSet const&, ompi_communicator_t* const&, dealii::IndexSet const&)
7: #2  hprun: void initialize_block_sparse_matrix<2, dealii::PETScWrappers::MPI::BlockSparseMatrix, 2>(dealii::PETScWrappers::MPI::BlockSparseMatrix&, dealii::DoFHandler<2, 2> const&, dealii::AffineConstraints<double> const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::Table<2, dealii::DoFTools::Coupling> const&)
7: #3  hprun: Stokes::Problem<2, PETSc, 2>::setup_system()
7: #4  hprun: Stokes::Problem<2, PETSc, 2>::run()
7: #5  hprun: main
7: --------------------------------------------------------

Remember: Start testing early...

marcfehling commented 1 year ago

git bisect will help to find the faulty commit for the poisson_trilinos case (and poisson_petsc as well).

Why does stokes_trilinos pass but stokes_petsc fail?

marcfehling commented 1 year ago

git bisect reports:

6d9bf74f794689fdf0e5838759f12c3d38a8b919 is the first bad commit
commit 6d9bf74f794689fdf0e5838759f12c3d38a8b919
Author: Marc Fehling <mafehling.git@gmail.com>
Date:   Tue Nov 15 15:37:13 2022 -0700

    Use update ghost values after operator=.

:040000 040000 de855d033f10d6a16eb76244f6e7110b2c4f3a03 30dbfd1e7d516345847056c6ab3cfee880e5c963 M  source
marcfehling commented 1 year ago

The original issues have been fixed in fix_bugs.

A new issue came up, that only sometimes triggers this assertion:

hprun_stokes_dealiitrilinos
===========================
6: --------------------------------------------------------
6: An error occurred in line <1850> of file </raid/fehling/dealii/include/deal.II/lac/la_parallel_vector.templates.h> in function
6:     dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::real_type dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::norm_sqr_local() const [with Number = double; MemorySpace = dealii::MemorySpace::Host; dealii::LinearAlgebra::distributed::Vector<Number, MemorySpace>::real_type = double]
6: The violated condition was: 
6:     dealii::numbers::is_finite(sum)
6: Additional information: 
6:     In a significant number of places, deal.II checks that some
6:     intermediate value is a finite number (as opposed to plus or minus
6:     infinity, or NaN/Not a Number). In the current function, we
6:     encountered a number that is not finite (its value is (inf,0) and
6:     therefore violates the current assertion).
6:     
6:     This may be due to the fact that some operation in this function
6:     created such a value, or because one of the arguments you passed to
6:     the function already had this value from some previous operation. In
6:     the latter case, this function only triggered the error but may not
6:     actually be responsible for the computation of the number that is not
6:     finite.
6:     
6:     There are two common cases where this situation happens. First, your
6:     code (or something in deal.II) divides by zero in a place where this
6:     should not happen. Or, you are trying to solve a linear system with an
6:     unsuitable solver (such as an indefinite or non-symmetric linear
6:     system using a Conjugate Gradient solver); such attempts oftentimes
6:     yield an operation somewhere that tries to divide by zero or take the
6:     square root of a negative value.
6:     
6:     In any case, when trying to find the source of the error, recall that
6:     the location where you are getting this error is simply the first
6:     place in the program where there is a check that a number (e.g., an
6:     element of a solution vector) is in fact finite, but that the actual
6:     error that computed the number may have happened far earlier. To find
6:     this location, you may want to add checks for finiteness in places of
6:     your program visited before the place where this error is produced.
6:     One way to check for finiteness is to use the 'AssertIsFinite' macro.
6: 
6: Stacktrace:
6: -----------
6: #0  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::norm_sqr_local() const
6: #1  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::norm_sqr() const
6: #2  /raid/fehling/bin/dealii-9.5.0-pre/lib/libdeal_II.g.so.9.5.0-pre: dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::l2_norm() const
6: #3  hprun: dealii::internal::SolverCG::IterationWorkerBase<dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::TrilinosWrappers::SparseMatrix, dealii::TrilinosWrappers::PreconditionJacobi>::startup(dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&)
6: #4  hprun: void dealii::SolverCG<dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >::solve<dealii::TrilinosWrappers::SparseMatrix, dealii::TrilinosWrappers::PreconditionJacobi>(dealii::TrilinosWrappers::SparseMatrix const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, dealii::TrilinosWrappers::PreconditionJacobi const&)
6: #5  hprun: LinearSolvers::BlockSchurPreconditioner<dealiiTrilinos>::vmult(dealii::LinearAlgebra::distributed::BlockVector<double>&, dealii::LinearAlgebra::distributed::BlockVector<double> const&) const
6: #6  hprun: void dealii::SolverFGMRES<dealii::LinearAlgebra::distributed::BlockVector<double> >::solve<dealii::TrilinosWrappers::BlockSparseMatrix, LinearSolvers::BlockSchurPreconditioner<dealiiTrilinos> >(dealii::TrilinosWrappers::BlockSparseMatrix const&, dealii::LinearAlgebra::distributed::BlockVector<double>&, dealii::LinearAlgebra::distributed::BlockVector<double> const&, LinearSolvers::BlockSchurPreconditioner<dealiiTrilinos> const&)
6: #7  hprun: Stokes::Problem<2, dealiiTrilinos, 2>::solve()
6: #8  hprun: Stokes::Problem<2, dealiiTrilinos, 2>::run()
6: #9  hprun: main
6: --------------------------------------------------------
marcfehling commented 1 year ago

Fixed by explicit initialization similar to https://github.com/geodynamics/aspect/pull/4973.