idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.77k stars 1.05k forks source link

Assertion `i*j < _val.size()' failed when coupling displaced and undisplaced variables with nonzero BC #9659

Closed tophmatthews closed 4 years ago

tophmatthews commented 7 years ago

Description of the enhancement or error report

When running a tensor mechanics problem with debug, the following assertion fails:

Output ``` Time Step 0, time = 0 dt = 0 Time Step 1, time = 1 dt = 1 0 Nonlinear |R| = 1.414214e-03 Assertion `i*j < _val.size()' failed. i*j = 0 _val.size() = 0 sh: line 1: 56583 Trace/BPT trap: 5 gdb -p 56581 -batch -ex bt -ex detach 2> /dev/null > temp_print_trace.njF0L8 Stack frames: 32 0: 0 libmesh_dbg.0.dylib 0x000000010e4d93af libMesh::print_trace(std::__1::basic_ostream >&) + 2287 1: 1 libmesh_dbg.0.dylib 0x000000010e4c81ca libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*) + 634 2: 2 libsolid_mechanics-dbg.0.dylib 0x0000000109fab9f8 libMesh::DenseMatrix::operator()(unsigned int, unsigned int) + 1752 3: 3 libmoose-dbg.0.dylib 0x000000010c84b3fa Kernel::computeOffDiagJacobian(unsigned int) + 426 4: 4 libtensor_mechanics-dbg.0.dylib 0x000000010b07c3d3 StressDivergenceTensors::computeOffDiagJacobian(unsigned int) + 259 5: 5 libmoose-dbg.0.dylib 0x000000010c00d86a ComputeFullJacobianThread::computeJacobian() + 1434 6: 6 libmoose-dbg.0.dylib 0x000000010c024b4b ComputeJacobianThread::onElement(libMesh::Elem const*) + 347 7: 7 libmoose-dbg.0.dylib 0x000000010bec9889 ThreadedElementLoopBase >::operator()(libMesh::StoredRange const&, bool) + 665 8: 8 libmoose-dbg.0.dylib 0x000000010c552f3b void libMesh::Threads::parallel_reduce, ComputeFullJacobianThread>(libMesh::StoredRange const&, ComputeFullJacobianThread&) + 155 9: 9 libmoose-dbg.0.dylib 0x000000010c54fca5 NonlinearSystemBase::computeJacobianInternal(libMesh::SparseMatrix&, Moose::KernelType) + 1813 10: 10 libmoose-dbg.0.dylib 0x000000010c55498a NonlinearSystemBase::computeJacobian(libMesh::SparseMatrix&, Moose::KernelType) + 106 11: 11 libmoose-dbg.0.dylib 0x000000010c0dd745 FEProblemBase::computeJacobian(libMesh::NumericVector const&, libMesh::SparseMatrix&, Moose::KernelType) + 1717 12: 12 libmoose-dbg.0.dylib 0x000000010c0dd07e FEProblemBase::computeJacobian(libMesh::NonlinearImplicitSystem&, libMesh::NumericVector const&, libMesh::SparseMatrix&) + 78 13: 13 libmoose-dbg.0.dylib 0x000000010c521c0b Moose::compute_jacobian(libMesh::NumericVector const&, libMesh::SparseMatrix&, libMesh::NonlinearImplicitSystem&) + 299 14: 14 libmesh_dbg.0.dylib 0x000000010faa9eb3 __libmesh_petsc_snes_jacobian + 6531 15: 15 libpetsc.3.7.dylib 0x00000001150bca7e SNESComputeJacobian + 926 16: 16 libpetsc.3.7.dylib 0x00000001150f00cc SNESSolve_NEWTONLS + 1356 17: 17 libpetsc.3.7.dylib 0x00000001150c118a SNESSolve + 858 18: 18 libmesh_dbg.0.dylib 0x000000010faa50bc libMesh::PetscNonlinearSolver::solve(libMesh::SparseMatrix&, libMesh::NumericVector&, libMesh::NumericVector&, double, unsigned int) + 2652 19: 19 libmesh_dbg.0.dylib 0x000000010fbc10b0 libMesh::NonlinearImplicitSystem::solve() + 1264 20: 20 libmoose-dbg.0.dylib 0x000000010cdae87e TimeIntegrator::solve() + 46 21: 21 libmoose-dbg.0.dylib 0x000000010c52391c NonlinearSystem::solve() + 956 22: 22 libmoose-dbg.0.dylib 0x000000010c0d7f28 FEProblemBase::solve() + 136 23: 23 libmoose-dbg.0.dylib 0x000000010cdcaec4 TimeStepper::step() + 36 24: 24 libmoose-dbg.0.dylib 0x000000010c722f92 Transient::solveStep(double) + 962 25: 25 libmoose-dbg.0.dylib 0x000000010c722b8d Transient::takeStep(double) + 253 26: 26 libmoose-dbg.0.dylib 0x000000010c722823 Transient::execute() + 211 27: 27 libmoose-dbg.0.dylib 0x000000010c49e19a MooseApp::executeExecutioner() + 170 28: 28 libmoose-dbg.0.dylib 0x000000010c4a085d MooseApp::run() + 157 29: 29 bison-dbg 0x0000000108fd6150 main + 592 30: 30 libdyld.dylib 0x00007fff81fe65ad start + 1 31: 31 ??? 0x0000000000000003 0x0 + 3 [0] /Users/topher/projects/moose/scripts/../libmesh/installed/include/libmesh/dense_matrix.h, line 837, compiled nodate at notime application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1 : system msg for write_line failure : Bad file descriptor ```

It seems to happen when the DenseMatrix is formulated for the off-diagonal jacobian, and only when there is a non-zero BC condition preset. It also requires an additional variable beyond just the displacement variables, and for strain = FINITE

Rationale for the enhancement or information for reproducing the error

The following input file can be used to reproduce the error with tensor-mechanics-dbg.

Input File ``` [GlobalParams] displacements = 'disp_x disp_y' [] [Mesh] type = GeneratedMesh dim = 2 [] [Variables] [./temp] [../] [] [Modules] [./TensorMechanics] [./Master] [./all] add_variables = true strain = FINITE [../] [../] [../] [] [Kernels] [./dt] type = TimeDerivative variable = temp [../] [] [BCs] [./no_x] type = DirichletBC variable = disp_x boundary = left value = 1.0e-3 [../] [] [Materials] [./stress] type = ComputeFiniteStrainElasticStress [../] [./elasticity_tensor] type = ComputeIsotropicElasticityTensor youngs_modulus = 1e11 poissons_ratio = 0.3 [../] [] [Preconditioning] [./SMP] type = SMP full = true [../] [] [Executioner] type = Transient [] ```

Identified impact

This will allow more complicated input files to be debugged.

dschwen commented 7 years ago

I'm looking at this right now. Notes of what I found so far:

Looks like an off diagonal jacobian is computed "unexpectedly". I.e. during setup MOOSE does not know the variables are coupled, but it computes the odj anyways...

dschwen commented 7 years ago

This is not a bug in the TensorMechanics action. If I unfold the action using the extremely useful DumpProblem object (#8876)... @permcody, I get

Input File ``` [GlobalParams] displacements = 'disp_x disp_y' [] [Mesh] type = GeneratedMesh dim = 2 [] [Variables] [./temp] [../] [./disp_x] [../] [./disp_y] [../] [] [Kernels] [./TM_all0] type = StressDivergenceTensors component = 0 use_displaced_mesh = true variable = disp_x [../] [./TM_all1] type = StressDivergenceTensors component = 1 use_displaced_mesh = true variable = disp_y [../] [./dt] type = TimeDerivative variable = temp [../] [] [BCs] [./no_x] type = DirichletBC variable = disp_x boundary = left value = 1.0e-3 [../] [] [Materials] [./stress] type = ComputeFiniteStrainElasticStress [../] [./elasticity_tensor] type = ComputeIsotropicElasticityTensor youngs_modulus = 1e11 poissons_ratio = 0.3 [../] [./all_strain] type = ComputeFiniteStrain [../] [] [Preconditioning] [./SMP] type = SMP full = true [../] [] [Executioner] type = Transient [] ```

Which fails with the same error

dschwen commented 7 years ago

The issue seems coupling a variable that has kernels only on the displaced mesh to a variable that has kernels only on the undisplaced mesh.

Minimal example without tensor mechanics:

[GlobalParams]
  displacements = 'u'
[]

[Mesh]
  type = GeneratedMesh
  dim = 1
[]

[Variables]
  [./u]
  [../]
  [./v]
  [../]
[]

[Kernels]
  [./u]
    type = Diffusion
    use_displaced_mesh = true
    variable = u
  [../]
  [./v]
    type = Diffusion
    use_displaced_mesh = false
    variable = v
  [../]
[]

[BCs]
  [./no_x]
    type = DirichletBC
    variable = u
    boundary = left
    value = 1.0e-3
  [../]
[]

[Preconditioning]
  [./SMP]
    type = SMP
    full = true
  [../]
[]

[Executioner]
  type = Transient
[]
tophmatthews commented 6 years ago

This came up again. Should this really be a restriction? ping @dschwen @permcody

dschwen commented 6 years ago

Yeah, bummer, this issue just got forgotten. Let me 🏓 @andrsd, who should know all about the allocation of the Jacobian blocks. Maybe he has some fresh ideas.

andrsd commented 6 years ago

So, you guys have a PDE where volumetric terms are on displaced mesh and its BCs are on undisplaced? Or is this just a simplification of something more complex?

BTW: just by looking at the input file, this should work. I suspect where the problem might be...

dschwen commented 6 years ago

@andrsd, I don't exactly remember, but I thought this required coupling the u and v (where u is in the displaced and v is on the undisplaced mesh), and then adding a non zero BC (no idea if the BC is on the displaced or undisplaced mesh though).

andrsd commented 6 years ago

The minimal example that breaks has variable v on non-displaced mesh. It is not coupled anywhere as far as I see.

I suspect the real problem is that a kernel is on a displaced mesh - those would be q-point values - plus there is a nodal BC - those would be nodal values - on a non-displaced mesh. And we do not flip the right flag to compute one of those... It is just my working theory for now...

dschwen commented 6 years ago

D'oh, you're right, there is no coupling. I guess you just need variables on both meshes.

tophmatthews commented 6 years ago

Cool, looks like you guys are tracking it down? I just don't want this to get lost again :)

andrsd commented 6 years ago

So, here is the problem (mostly for me so that I do not forget):

  1. SMP full=true has to be active, so that we are computing all off-diagonal blocks
  2. then we resize jacobian blocks in Assembly like so:
     jacobianBlock(vi, vj).resize(ivar.dofIndices().size(), jvar.dofIndices().size());

    Now, u is on displaced mesh and v is not. So in the non-displaced instance for variable v, we resize the block to (2, 0).

  3. in Kernel::computeOffDiagJacobian for v variable we get:
    _test.size() == 2
    _phi.size() == 2

    And then we go into the previously sized block that has incorrect dimensions.

tophmatthews commented 6 years ago

Looks like this is close, I'll be glad to see this resolved after all this time!

tophmatthews commented 6 years ago

I guess my last comment was confusing for some...this isn't closed, but sounds close since @andrsd has an idea on what the problem is

tophmatthews commented 6 years ago

@andrsd @dschwen Okay, it looks like this was solved by something else! Maybe the simplified input file given here should be implemented as a test to ensure it doesn't get broken in the future?

tophmatthews commented 6 years ago

Looks like this has be solved by something else

tophmatthews commented 6 years ago

UGGG!!!! This problem still exists. Here is another condensed input file that reproduces the error. As far as I can tell, no line can be removed without clearing the error. This is causing problems with some large BISON runs I'm trying to run through debug and is hindering development. Any ideas @dschwen @andrsd ?

Output ``` [GlobalParams] displacements = 'disp_x disp_y' order = SECOND [] [Mesh] type = GeneratedMesh dim = 2 nx = 10 ny = 10 second_order = 2 [] [Variables] [./temp] initial_condition = 298 [../] [./diff] [../] [] [Modules/TensorMechanics/Master] [./tm] add_variables = true [../] [] [Kernels] [./temp_dt] type = TimeDerivative variable = temp [../] [./diff_dt] type = TimeDerivative variable = diff [../] [] [ThermalContact] [./thermal_contact] type = GapHeatTransfer variable = temp master = left slave = right quadrature = true [../] [] [Materials] [./clad_elasticity_tensor] type = ComputeIsotropicElasticityTensor youngs_modulus = 1 poissons_ratio = 0.3 [../] [./clad_stress] type = ComputeLinearElasticStress [../] [] [Preconditioning] [./SMP] type = SMP full = true [../] [] [Executioner] type = Transient [] ```
tophmatthews commented 6 years ago

This latest input has to be run with combined-dbg

naveen-pr commented 6 years ago

Any new thoughts on this issue ? I've been trying to run some test cases with the peridynamics module and running into the same error in devel mode

tophmatthews commented 6 years ago

ping @dschwen @andrsd ???

dschwen commented 6 years ago

I ran the input with the latest moose/libmes combined-dbg and it errors out with

Solve failed and timestep already at or below dtmin, cannot continue!

but not with the assertion you mentioned.

Argh, never mind, the one in your comment from May 31st still triggers the bug.

dschwen commented 6 years ago

Ok, some preliminary observation:

TaggingInterface::prepareMatrixTag sets up the _local_ke matrix size. For the temp/diff coupling case only this sets up the matrix as 9x0 a few times. The vast majority of calls sets it up to the correct 9x9 size. Scaling with mesh size indicates that only the size in the IntegratedBC is screwed up (the kernels get the correct _local_ke). It is always the matrix dimension corresponding to the diff variable that is messed up.

This can be traced further up to Assembly::prepareJacobianBlock, where dofIndices().size() for that variable occasionally is 0.

The problem also seems to vanish when only using first order elements (just setting order 1 on temp or diff alone does not make the problem go away).

tophmatthews commented 5 years ago

@dschwen @andrsd This popped up again and is slowing development...any way we can get this through?! What can I do to help?

dschwen commented 5 years ago

I'm on travel this week

tophmatthews commented 5 years ago

Whew...I think this was fixed along the way...

permcody commented 5 years ago

Really? It's not referenced. I hope we have a test case, or if not adding one now would be great!

tophmatthews commented 5 years ago

There are three different inputs that should prove it in this issue

tophmatthews commented 5 years ago

Nope! Still a problem!!!!! I closed this prematurely somehow.

I keep coming around to this when I have to use dbg on thermo-mechanical problems. Once I get pulled away to something else and I stop pushing this, I forget, @dschwen forgets, @andrsd forgets, everyone else forgets, and it gets consistently lost.

This was so close at some point with @andrsd identifying the problem, but this never being solved.

I'll open a PR with a test that clearly fails so this hopefully gets more visibility, and this damn thing gets closed finally!

permcody commented 5 years ago

Gonna rename this to "Topher's bug" - Seems like it's the bane of your existence.

tophmatthews commented 5 years ago

Glad someone could sense my frustration... :)