idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.72k stars 1.04k forks source link

Mesh input runs in opt mode, but hits assertion in devel mode #24474

Closed olinwc closed 1 year ago

olinwc commented 1 year ago

Bug Description

The attached mesh input file runs without errors in opt and (appears) to generate a valid exodus file which can be viewed in Paraview, but when running in devel mode assertions are hit. This on my local Mac Pro.

*** ERROR ***
Assertion `it->second == node' failed.
it->second = 0x60000f12e640
node = 0x60000f127b40

Stack frames: 15
0: 0   libmesh_devel.0.dylib               0x000000011cd76eec libMesh::print_trace(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) + 1148
1: 1   libmesh_devel.0.dylib               0x000000011cd73683 libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*, std::__1::basic_ostream<char, std::__1::char_traits<char> >&) + 179
2: 2   libmesh_devel.0.dylib               0x000000011d70b839 libMesh::Poly2TriTriangulator::triangulate_current_points() + 3625
3: 3   libmesh_devel.0.dylib               0x000000011d717a98 libMesh::Poly2TriTriangulator::triangulate() + 136
4: 4   libmoose-devel.0.dylib              0x000000011a89d7ec XYDelaunayGenerator::generate() + 2380
5: 5   libmoose-devel.0.dylib              0x000000011a95e291 MeshGenerator::generateInternal() + 1313
6: 6   libmoose-devel.0.dylib              0x000000011b759d5a MeshGeneratorSystem::executeMeshGenerators() + 1658
7: 7   libmoose-devel.0.dylib              0x000000011b0f3013 Action::timedAct() + 83
8: 8   libmoose-devel.0.dylib              0x000000011b0f29c9 ActionWarehouse::executeActionsWithAction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 185
9: 9   libmoose-devel.0.dylib              0x000000011b12b248 ActionWarehouse::executeAllActions() + 264
10: 10  libmoose-devel.0.dylib              0x000000011b7c8525 MooseApp::runInputFile() + 117
11: 11  libmoose-devel.0.dylib              0x000000011b7c2870 MooseApp::run() + 912
12: 12  libgriffin-devel.0.dylib            0x000000010f9cc6dd GriffinApp::run() + 4637
13: 13  griffin-devel                       0x000000010f234bdc main + 140
14: 14  dyld                                0x000000011120352e start + 462
[0] ../src/mesh/poly2tri_triangulator.C, line 396, compiled May 16 2023 at 01:06:12

Steps to Reproduce

Run the attached input file with --mesh-only and try to run in opt and devel mode

Impact

I haven't tried actually using this mesh yet in a problem to see what happens, but the error in devel mode does not inspire confidence. Also have been hitting some other issues with the mesh sometimes hanging, but I can't debug because it won't run in devel mode.

Input File gHPMR_2d_core_v29.txt

lindsayad commented 1 year ago

@roystgnr is this something you might be able to look into?

olinwc commented 1 year ago

Here's the output when running ./reactor-devel -i ../../../gHPMR_2d_core_v29.i --mesh-only

Setting Up.

*** ERROR ***
Assertion `it->second == node' failed.
it->second = 0x60000cb8b120
node = 0x60000cbbc660

Stack frames: 14
0: 0   libmesh_devel.0.dylib               0x0000000110549eec libMesh::print_trace(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) + 1148
1: 1   libmesh_devel.0.dylib               0x0000000110546683 libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*, std::__1::basic_ostream<char, std::__1::char_traits<char> >&) + 179
2: 2   libmesh_devel.0.dylib               0x0000000110ede839 libMesh::Poly2TriTriangulator::triangulate_current_points() + 3625
3: 3   libmesh_devel.0.dylib               0x0000000110eeaa98 libMesh::Poly2TriTriangulator::triangulate() + 136
4: 4   libmoose-devel.0.dylib              0x000000010e0707ec XYDelaunayGenerator::generate() + 2380
5: 5   libmoose-devel.0.dylib              0x000000010e131291 MeshGenerator::generateInternal() + 1313
6: 6   libmoose-devel.0.dylib              0x000000010ef2cd5a MeshGeneratorSystem::ex                                                                              [ 10.47 s] [  119 MB]
ecuteMeshGenerators() + 1658
7: 7   libmoose-devel.0.dylib              0x000000010e8c6013 Action::timedAct() + 83
8: 8   libmoose-devel.0.dylib              0x000000010e8c59c9 ActionWarehouse::executeActionsWithAction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 185
9: 9   libmoose-devel.0.dylib              0x000000010e8fe248 ActionWarehouse::executeAllActions() + 264
10: 10  libmoose-devel.0.dylib              0x000000010ef9b525 MooseApp::runInputFile() + 117
11: 11  libmoose-devel.0.dylib              0x000000010ef95870 MooseApp::run() + 912
12: 12  reactor-devel                       0x000000010a450bcb main + 139
13: 13  dyld                                0x000000010b31552e start + 462
[0] ../src/mesh/poly2tri_triangulator.C, line 396, compiled May 16 2023 at 01:06:12

Stack frames: 6
0: 0   libmesh_devel.0.dylib               0x0000000110549eec libMesh::print_trace(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) + 1148
1: 1   libmoose-devel.0.dylib              0x000000010efcdb30 moose::internal::mooseErrorRaw(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 784
2: 2   libmoose-devel.0.dylib              0x000000010e6eb217 void mooseError<char const*>(char const*&&) + 247
3: 3   libmoose-devel.0.dylib              0x000000010ef96a26 MooseApp::run() + 5446
4: 4   reactor-devel                       0x000000010a450bcb main + 139
5: 5   dyld                                0x000000010b31552e start + 462
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
roystgnr commented 1 year ago

@roystgnr is this something you might be able to look into?

If I can reproduce it, at least. I'll try that now.

olinwc commented 1 year ago

Here's the first file I started hitting the failure with. On line 393 I have a pattern which runs with no assertion hit and then on line 408 the specified pattern fails when I change a single assembly in PatternedHexMeshGenerator

gHPMR_2d_core_v18.txt

roystgnr commented 1 year ago

Score! I can reproduce this with v29 ... though not with v18 for some reason.

olinwc commented 1 year ago

This is the MOOSE commit I hit it with

commit e0b3d67f8489c7e79339b532606fe884756545a1 (HEAD)
Merge: 42342ed08d 395ab6da47
Author: moosetest <bounces@inl.gov>
Date:   Mon May 22 22:33:58 2023 -0600

    Merge commit '395ab6da47fe9a63aab358a9ca122e2fd6dd943d'
roystgnr commented 1 year ago

Seeing if I can pare this down as much as possible ... removing control drums 5,6,2,3 gets me the same error ... removing 1,4 too gets me a different assertion failure.

roystgnr commented 1 year ago

Hey, what's going on with y1 vs y2 in outer_core_ring? (I'm like 99.99% sure this is unrelated to the failure; I'm just confused by it)

olinwc commented 1 year ago

I saw somebody make a circle with ParsedCurveGenerator once doing that so I just copied them to make a circle

olinwc commented 1 year ago

I used this example, which was close to a circle, and figured I could do the same thing to make a circle. Listing 2 in Example Syntax for https://mooseframework.inl.gov/source/meshgenerators/ParsedCurveGenerator.html

[pcg]
  type = ParsedCurveGenerator
  x_formula = '1.0*cos(t)'
  y_formula = 'y1:=1.0*sin(t);
                 y2:=1.5*sin(t);
                 if(t<pi,y1,y2)'
  section_bounding_t_values = '0.0 ${fparse pi} ${fparse 2.0*pi}'
  nums_segments = '10 10'
  constant_names = 'pi'
  constant_expressions = '${fparse pi}'
  is_closed_loop = true
[]
roystgnr commented 1 year ago

Thanks for the explanation! They have that extra trickiness because of the jump from 1.0 to 1.5; you could have just gone down to y_formula = '${outer_core_radius}*sin(t)' since you don't have a jump. No shame, though; "find someone else who did it right and tweak their thing" is definitely the right way to get started.

olinwc commented 1 year ago

Running v18 on Sawtooth in devel mode I also don't hit the assertion.

roystgnr commented 1 year ago

Do you still hit it on Sawtooth with v29?

olinwc commented 1 year ago

Just tested it and I do

olinwc commented 1 year ago

Let me test some more of my intermediate versions on Sawtooth and see what hits it there

olinwc commented 1 year ago

V21 runs fine

olinwc commented 1 year ago

V25 fails

olinwc commented 1 year ago

V23 fails

olinwc commented 1 year ago

And V22 was just a small control rod test, which passes

olinwc commented 1 year ago

V21, which passes gHPMR_2d_core_v21.txt

V23, which fails gHPMR_2d_core_v23.txt

olinwc commented 1 year ago

Unfortunately, the difference between V21 and V23 is adding the control drums, which it looks like you've already determined is the issue

roystgnr commented 1 year ago

Nope. I've gotten rid of the control drums entirely and managed to retain the failure.

roystgnr commented 1 year ago

The same failure, even; it cropped up again as I pared down the problem further.

olinwc commented 1 year ago

Well this is interesting, I modified V21 to shrink the outer ring because I remember having problems with shrinking that ring, and I get a different assertion

gHPMR_2d_core_v21a.txt

Setting Up                                                                               [  8.61 s] [  120 MB]

*** ERROR ***
Assertion `new_first != new_second' failed.
new_first = 2367
new_second = 2367

Stack frames: 16
0: libMesh::print_trace(std::ostream&)
1: libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*, std::ostream&)
2: /scratch/calvolin/sawtooth/griffin/moose/scripts/../libmesh/installed/lib/libmesh_devel.so.0(+0xe83476) [0x7fffe980b476]
3: libMesh::Poly2TriTriangulator::insert_refinement_points()
4: libMesh::Poly2TriTriangulator::triangulate()
5: XYDelaunayGenerator::generate()
6: MeshGenerator::generateInternal()
7: MeshGeneratorSystem::executeMeshGenerators()
8: Action::timedAct()
9: ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
10: ActionWarehouse::executeAllActions()
11: MooseApp::runInputFile()
12: MooseApp::run()
13: main
14: __libc_start_main
15: ./reactor-devel() [0x402945]
[0] ../src/mesh/poly2tri_triangulator.C, line 957, compiled May 24 2023 at 16:30:35

Stack frames: 7
0: libMesh::print_trace(std::ostream&)
1: moose::internal::mooseErrorRaw(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
2: void mooseError<char const*>(char const*&&)
3: /scratch/calvolin/sawtooth/griffin/moose/framework/libmoose-devel.so.0(+0xb6caee) [0x7fffeb58faee]
4: main
5: __libc_start_main
6: ./reactor-devel() [0x402945]
[sawtooth1:mpi_rank_0][MPIDI_CH3_Abort] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0: No such file or directory (2)
roystgnr commented 1 year ago

Yeah, that's the other I've been able to hit.

roystgnr commented 1 year ago

I'm well out of time for tonight. I've got a failure case down to 70 lines and a much smaller mesh. roy_v20.txt

I don't actually understand some of the things I can't remove, though. What do num_sectors_per_side and num_sectors_per_side_meta do? From https://mooseframework.inl.gov/source/meshgenerators/HexagonConcentricCircleAdaptiveBoundaryMeshGenerator.html I see that the former will be overridden, but I still have to provide it?

roystgnr commented 1 year ago

Well, even simpler would have been even better, but this is looking comprehensible enough in the debugger.

At the simplest failure I can trigger, I'm seeing triangles with inverted orientations, which would definitely explain why refining those triangulations isn't working ... but even if inputs aren't being consistent about boundary orientation, the triangulator code is supposed to be detecting and correcting any incompatibility. Still trying to figure out why it's not doing so here.

olinwc commented 1 year ago

@roystgnr Regarding num_sectors_per_side_meta that's needed to add the metadata for "irregular" hex cells added in PatternedHexMeshGenerator. Members of the reactor team can explain exactly why that metadata is required better than me.

I don't think I specify any sides to adapt HexagonConcentricCircleAdaptiveBoundaryMeshGenerator to so num_sectors_per_side still defines the initial number of sectors in the hexagons, but I end up deleting this interior and remeshing/refining the boundary with XYDG, which is why that sector definition doesn't actually "matter" in the final mesh.

I guess if I defined a really large number of sectors per side, like 100, it would be more refined than what the XYDG refinement generates.

roystgnr commented 1 year ago

I was triggering inverted elements in cases where I forced sliver elements (by disabling outer boundary refinement but still requiring desired_area=0.93); without that, I can still get an error but it's now "Opposing point on constrained edge" from inside poly2tri.

On the bright side, I've ruled out a whole bunch of issues, and I'm down to 29 lines on the failing test case. roy_v22.txt

roystgnr commented 1 year ago

And now I'm down to 7 elements on the unrefined failure case; getting rid of the angular offset in inner_core_ring, I can bring that down to 3 sides and outer_core_ring down to 4, and I still see (sufficient amounts of) refinement trying to create sliver elements. This is also clearly narrowing down the bug to "definitely in my code", so there's no longer any risk of me coming back ranting about how many days I wasted on an input that couldn't be meshed without FPU-defying sliver elements; the slivers here are definitely an artifact of something wrong in the code, not the input file.

lindsayad commented 1 year ago

I would definitely have felt bad (and taken responsibility) if that was the case

roystgnr commented 1 year ago

Eh, even if it had been a bad input file, it would have been the sort of bad input file that XYDelaunay is supposed to be sanity checking. But we have a known bug in the sanity checking; I was kind of hoping that we'd slipped past another of those, not a bug in the actual triangulation code.

roystgnr commented 1 year ago

This is definitely some problem with sliver elements. I can even trigger a failure with no holes, if I just disable boundary refinement and push interior refinement hard enough. I'd noticed that in the past, but I'd thought it was happening with triangle aspect ratios like 10^6, not aspect ratios like 10. And I'd never managed to trigger it with the early stages of a triangulation refinement before, but the combination of small boundary/hole segments and large domains seems to be able to do so.

Not sure I'll have it fixed today, though. Feeling sick, might have to knock off early.

roystgnr commented 1 year ago

The failure case I've boiled down to is making more and more sense. It's not that the aspect ratios are ~10 and then it fails, the aspect ratios are ~10 when we're pushed hard enough to hit the failure case: we see an element in need of refinement, we do the Standard Delaunay Thing and try to add its circumcenter as a new node, but the circumcenter falls outside an boundary with refinement disabled so as a fallback we add the centroid of that boundary element instead ... and that works well enough once or twice, but at some point refining the boundary element's Delaunay cavity no longer naturally leads to refinement of the desired element, we loop until we have crazy 10^6 scale slivered elements, and either poly2tri or libMesh or both fail via FP precision issues.

This might be as simple as having a safer fallback, then. Fingers crossed.

roystgnr commented 1 year ago

Not that simple. Safer fallback fixed my simplest failure case completely, but isn't sufficient for the v29 reactor mesh. I'll "bisect" again and keep going from here, though.

roystgnr commented 1 year ago

roy_v20.txt still fails with the new fallback; roy_v22.txt succeeds now.

roystgnr commented 1 year ago

I may be also creating sliver elements inadvertently when circumcenters fall just inside a boundary? ... which is easy enough to do when you combine interesting regular geometries with slight perturbations. If that's the case then I need an extra test. Look at each boundary of a new cavity element, and if the area coordinate of a new Steiner point with respect to an unrefinable boundary is too small, increase it? (and if the area coordinate of a new Steiner point with respect to a refinable boundary is too small, snap it to the boundary?)

I'm about done for the day though so I'll have to take a crack next week.

olinwc commented 1 year ago

Thanks for taking the time to look into this @roystgnr ! Tag @jortensi since they are interested in this issue also

Here is the latest version of the core I'm using, I coarsened the area meshed by XYDG to see how low I could get the number of elements. It still fails in devel in case you want to test this with whatever changes you are making. I will be out on PL all of next week and will be back in office on 6/5 gHPMR_2d_core_v33.txt

roystgnr commented 1 year ago

And ... I already had that extra test (though computed via cosines rather than area coordinates).

Once that red herring was out of the way I found some too-high-floating-point-tolerance issues, and tightening up those gets some more of my test cases working, but roy_v20.txt is still failing.

roystgnr commented 1 year ago

What the hell. Poly2Tri occasionally gives us a non-Delaunay triangulation?

roystgnr commented 1 year ago

And if I manually fix non-Delaunay triangulations, then OH MY GOD IT WORKS. Including on gHPMR_2d_core_v29.txt.

Not sure how long this'll take to clean up (I'll get everything into a "roys_junk202305" branch right away lest I regress it again while refactoring...) and downstream (just fixing up the input deck sanity checking seems like it'll require a Sockeye test suite fix...), but I'll get started now.

olinwc commented 1 year ago

@roystgnr just wanted to let you know I am back at work if you need anything from me, really appreciate all the time you've spent working on fixing this!

roystgnr commented 1 year ago

Could you try out libMesh/libmesh#3572 on more of your own cases? It'll be a while before it's downstream (probably next week) and I'd hate to get the libMesh submodule updated only to discover that there were still further fixes necessary.

olinwc commented 1 year ago

Will be resolved in MOOSE once https://github.com/idaholab/moose/pull/24735 is merged.