Closed roystgnr closed 9 years ago
Can you post the stack trace to this issue? Also, I forget, does one need a complex-enabled PETSc to run this example successfully in --enable-complex mode?
It's definitely possible there are bugs in the commit you mentioned, but I don't see how they would be related to Real/complex... the patch doesn't really do anything significant with Reals or Numbers.
Also -- the referenced commit doesn't have anything to do with mixed-dimension libmesh meshes, only with handling BCs specified on lower-dimensional elements in gmsh files. The lower-dimensional elements themselves don't make it into the final Mesh.
The --enable-complex is required to run the whole example, but I can strip out the relevant ifdefs and get the same failure with little more than mesh.read("horn.msh");
The assertion failure is the verify(mesh.n_elem())
at mesh_communication.C:716, and the rest of the stack is just unstructured_mesh.C:769 and reduced_basis_ex7.C.
Running "gmsh -2 horn.geo" gives me a much smaller horn.msh file (albeit one with roughly the same number of elements?) but the new file exactly reproduces the original error so I'm not too concerned about being unable to exactly reproduce the original mesh.
The assertion failure is the verify(mesh.n_elem())
Hmm.. are you running in parallel? How many procs?
Originally 12, but I get the same failure with 2. 1 works, naturally. Checking things out in the debugger, the number of elements that proc 1 thinks it has is indeed only 90% or so of the number of elements proc 0 should have sent. It's as if some of the elements ended up with duplicate DofObject ids and so just overwrote each other on the receive end.
Oh, I'm running with --enable-parmesh, too. Haven't tried to duplicate with SerialMesh yet, although obviously I should...
Oh, I'm running with --enable-parmesh, too.
Haha, wow. OK, so... fails in parallel with --enable-complex --enable-parmesh
. Sure you don't want to turn on infinite elements support too? ;-)
The gmsh reader definitely was not tested extensively in this configuration...
I'm finishing building complex/SerialMesh now, will let you know what I find...
complex doesn't seem to matter (except that it hid the initial regression, since the other adjoints_ex3 complex regression was cropping up before we ever got to reduced_basis), but ParallelMesh does. Changing Mesh to SerialMesh fixes things for me, and you'll probably need to change Mesh to ParallelMesh to trigger the failure.
Yes, I haven't reproduced the ParallelMesh error yet, but I can confirm that the code runs fine on 2 processors with SerialMesh in debug mode with --enable-complex.
So the reader is almost certainly doing something that doesn't work with ParallelMesh. My first thought was adding Points/Elems without IDs, but that doesn't seem to be the case. Is the reader being called only on processor 0 or on all processors?
For GMsh we call the read on all processors, but under the hood that means that processor 0 does all the work and then broadcasts the complete mesh to the rest.
Removing both of the "Physical Line" declarations in horn.geo and regenerating seems to fix the error?
But with only Physical Line 1 and Physical Surface 1 (and their dependencies) I get the same error, this time with only 427 elements.
Now, how to tell Gmsh that this domain can be meshed with only three triangles and a line...
IIRC, "Physical Line/Surface" are the things that add the lower-dimensional elements with boundary information... so if removing them makes the bug go away, the issue must be something with the BoundaryInfo object in ParallelMesh?
Bumping up desired element size and getting rid of the mid-side node gets Gmsh down to "5 vertices 12 elements" which I assume means 4 triangles and 8 edges and a mid-box vertex.
I still get the failure here, and now we have processor 0 with 5 elements and processor 1 with 4 elements. It looks like the lower-dimensional edge might be getting added as more than just a boundary element?
It looks like the lower-dimensional edge might be getting added as more than just a boundary element?
They definitely do get added in the initial pass of the reader, then subsequently deleted. So perhaps the delete step is not working with ParallelMesh, and then only the high-dimensional elements get broadcast, resulting in the mismatch?
Hmm... I think I'm on the trail now. Thanks for the help; I'll try and get a PR ready after lunch.
Fixed by #471
Sorry, I was travelling, just saw this now. I did add this example, and it looks like you guys fixed the GMSH issue here already, thanks.
All fixed, yeah. And the bug turned out to be wholly my fault, so no worries about not getting sucked into the vortex of fixing it; thanks for adding enough test coverage that we noticed it.
Building with METHODS=dbg --enable-complex I find that reduced_basis_ex7 succeeds when using cf8fe174f3dfccc2dda54007aa6ba6076366c507 but fails when using 5260d07dc8bd02935f5f4ad43c13994522bece42
This is a "handle mixed-dimension meshes" commit from @jwpeterson, and so there's a small chance this is relevant to #433. The actual failure is in MeshCommunication::broadcast(), and so there's a large chance that the actual error is something in broadcast_packed_range() which merely was triggered by John's new code.
If anyone is familiar with GMsh/.geo/.msh (is this your example, @dknez?), I'd love help scaling down horn.msh to ease debugging.