idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.76k stars 1.05k forks source link

Mesh refinement with NodeElem(s) in mesh causes crash #28538

Closed joshuahansel closed 1 month ago

joshuahansel commented 1 month ago

Bug Description

In THM, NodeElems will be used for 0D junctions, but when you try to uniformly refine the mesh, you get an error. See below.

Steps to Reproduce

Run the test (with https://github.com/idaholab/moose/pull/28514)

modules/thermal_hydraulics/test/tests/misc/uniform_refine/test

which uses the CLI args -r 1 to uniformly refine. This causes the following error:

*** ERROR ***
/Users/hansje/projects/sockeye/moose/modules/thermal_hydraulics/test/tests/misc/uniform_refine/test.i:41.1:
The following error occurred in the Problem 'THM:problem' of type THMProblem.

The following parallel-communicated exception was detected during INITIAL evaluation:
A libMesh::LogicError was raised during FEProblemBase::computeUserObjectsInternal
No index 18446744073709551615 in ghosted vector.
Vector contains [0,50)
And empty ghost array.

Stack frames: 20
0: 0   libmesh_dbg.0.dylib                 0x000000012d71d8a4 libMesh::print_trace(std::__1::basic_ostream<char, std::__1::char_traits<char>>&) + 644
1: 1   libmesh_dbg.0.dylib                 0x000000012d712ba7 libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*, std::__1::basic_ostream<char, std::__1::char_traits<char>>&) + 215
2: 2   libmesh_dbg.0.dylib                 0x000000012e59f784 libMesh::PetscVector<double>::map_global_to_local_index(unsigned long long) const + 1716
3: 3   libmesh_dbg.0.dylib                 0x000000012e5c580b libMesh::PetscVector<double>::get(std::__1::vector<unsigned long long, std::__1::allocator<unsigned long long>> const&, double*) const + 107
4: 4   libmoose-dbg.0.dylib                0x000000011ce523ac MooseVariableDataBase<double>::fetchDoFValues() + 3580
5: 5   libmoose-dbg.0.dylib                0x000000011ce60d06 MooseVariableData<double>::computeValues() + 86
6: 6   libmoose-dbg.0.dylib                0x000000011ce04422 MooseVariableFE<double>::computeElemValues() + 66
7: 7   libmoose-dbg.0.dylib                0x000000011d2d6162 SystemBase::reinitElem(libMesh::Elem const*, unsigned int) + 354
8: 8   libmoose-dbg.0.dylib                0x000000011c22ced0 FEProblemBase::reinitElem(libMesh::Elem const*, unsigned int) + 144
9: 9   libthermal_hydraulics-dbg.0.dylib   0x000000010ec2e640 ADVolumeJunctionBaseUserObject::initialize() + 752
10: 10  libmoose-dbg.0.dylib                0x000000011c49396d FEProblemBase::computeUserObjectsInternal(MooseEnumItem const&, Moose::AuxGroup const&, TheWarehouse::QueryCache<>&) + 7117
11: 11  libmoose-dbg.0.dylib                0x000000011c25d8ba FEProblemBase::computeUserObjects(MooseEnumItem const&, Moose::AuxGroup const&) + 218
12: 12  libmoose-dbg.0.dylib                0x000000011c28c098 FEProblemBase::execute(MooseEnumItem const&) + 424
13: 13  libmoose-dbg.0.dylib                0x000000011c26ce29 FEProblemBase::initialSetup() + 60649
14: 14  libmoose-dbg.0.dylib                0x000000011b3fbc7c Transient::init() + 92
15: 15  libmoose-dbg.0.dylib                0x000000011d8f194f MooseApp::executeExecutioner() + 3759
16: 16  libmoose-dbg.0.dylib                0x000000011d8e96e3 MooseApp::run() + 10611
17: 17  thermal_hydraulics-dbg              0x000000010bc7ebdf void Moose::main<ThermalHydraulicsTestApp>(int, char**) + 127
18: 18  thermal_hydraulics-dbg              0x000000010bc7eb52 main + 34
19: 19  dyld                                0x00007ff8032cf345 start + 1909
[0] ./include/libmesh/petsc_vector.h, line 1086, compiled Aug 16 2024 at 12:33:56

Because this did not occur during residual evaluation, there is no way to handle this, so the solution is aborting.

Impact

Inability to use mesh refinement feature in many THM simulations (and possibly other future MOOSE simulations if others start using NodeElem).

[Optional] Diagnostics

No response

joshuahansel commented 1 month ago

@roystgnr What do you think should happen if you try to uniformly refine a mesh that has a NodeElem in it? I would want that element to not be refined of course 😄 If you don't have a good idea off the top of your head, I can dig. Looks like some nonsense ID like 18446744073709551615 shows up in this case.

roystgnr commented 1 month ago

I'd want the NodeElem to have a single child NodeElem, so we don't get any new DoFs but we do still have a uniform elem->level() for all active elements, which would be especially valuable if the NodeElem has an interior_parent() and the user has a level mismatch rule active for that.

I might be persuaded that "just don't refine the NodeElem" is a better solution, but obviously "crash" is not.

I notice that our zillion INSTANTIATE_ELEMTEST(PRISM20) unit test creations don't include any INSTANTIATE_ELEMTEST(NODEELEM). In hindsight that was probably a mistake ... let me rectify that and see whether the refinement tests catch a bug here ... as well as how many other bugs we catch alongside it.

roystgnr commented 1 month ago

as well as how many other bugs we catch alongside it.

Fun fact: when you fail enough tests in CppUnit, the ".E.E.E.E.E.E.E.E.E.E.E.E..." looks like your computer is screaming in agony.

roystgnr commented 1 month ago

Well, that's not going to help. After I fix the unit test itself, there are still 4 failures left, none from test_refinement or test_double_refinement. We're going to need to boil down the actual failure into a test case.

joshuahansel commented 1 month ago

Thanks for looking into it! I think it's not unlikely that this is due to my implementation in THM, and I can see where it might happen.

When I first add the NodeElem to the mesh, I save its ID, which then gets passed to my side user object (which loops over the boundaries of the pipes connected to the junction) as an input parameter. I need to use variable values from this element, so I do this:

    const Elem * junction_elem = _mesh.elemPtr(_junction_elem_id);
    _fe_problem.setCurrentSubdomainID(junction_elem, _tid);
    _fe_problem.prepare(junction_elem, _tid);
    _fe_problem.reinitElem(junction_elem, _tid);

and then get my variable values.

So now since the mesh has been refined, should I be reinitializing a different element? You mentioned that it should be creating a child element even though it's just a NodeElem, so maybe I just need to update to the child element ID?

roystgnr commented 1 month ago

Did you disable renumbering? If not then don't expect either ID to definitely remain the same through refinement. You could save the Elem * (if you're on a ReplicatedMesh) but then you'd need to update it to be the child of the original after refinement.

What type (FEType) of variable(s) do you have on the NodeElem?

joshuahansel commented 1 month ago

We did disable renumbering in THM. As for replicated vs. distributed, we usually use replicated but should probably keep distributed working too. I'm using 1st-order Lagrange for that NodeElem.

roystgnr commented 1 month ago

should probably keep distributed working too

Make sure you're running test sweeps over MPI processor counts. There are so many corner cases you can get into with different partitionings that it's not uncommon for a new distributed-mesh bug to show up on e.g. 6 ranks despite working fine on 1 through 5.

I'm using 1st-order Lagrange for that NodeElem.

This makes things a tiny bit more robust - the DoFs will be stored on the Node rather than on the Elem as in a Monomial FE family, so if you accidentally evaluate something on the parent NodeElem rather than the child it'll still give you the right value.