Closed benkirk closed 10 years ago
Also 49c4f4f - I ran into that while investigating Derek's new patch, but it seems to be an older problem.
I'm still trying to get a handle on that weird ParallelMesh+adjoints_ex3 failure, though; sorry it's taking so long.
Thanks - I would have missed that; thinking it was dependent on Derek’s new stuff.
-Ben
On Feb 4, 2014, at 11:46 AM, roystgnr notifications@github.com wrote:
Also 49c4f4f - I ran into that while investigating Derek's new patch, but it seems to be an older problem.
I'm still trying to get a handle on that weird ParallelMesh+adjoints_ex3 failure, though; sorry it's taking so long.
— Reply to this email directly or view it on GitHub.
I've managed to at least trigger a lower level of failure now. I can get adjoints_ex3 to generate a Nemesis restart file which, although it looks fine in my Paraview 4.0.1, triggers assertion failures when I load it into libMesh. Can anyone else take a look at http://users.ices.utexas.edu/~roystgnr/badmesh/ ?
OK, so simply loading this on 4 processors freaks things out?
I'll see if I can do anything there...
Just UnstructuredMesh::read() on four processors in dbg or devel mode should trigger the problem. One processor sees a global_node_idx of -1; another processor somehow gets a num_elems_global of 24 instead of 22.
Thanks!
Has this mesh been refined? If so, I'm not sure what to expect from Nemesis...
This is the mesh that gets written out after one adaptive refinement step. Your screenshot looks like what I'd expect. Have you tried loading it in a libMesh app yet?
Not yet, but what I meant is that the libMesh nemesis reader would not understand adapted meshes so I'm not surprised that could cause problems. Still, I'll run it through and see what I can find.
On Feb 10, 2014, at 4:10 PM, "roystgnr" notifications@github.com<mailto:notifications@github.com> wrote:
This is the mesh that gets written out after one adaptive refinement step. Your screenshot looks like what I'd expect. Have you tried loading it in a libMesh app yet?
— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-34688909.
Yeah, the error I see is
*** Warning, This code is untested, experimental, or likely to see future API changes: ../src/mesh/nemesis_io_helper.C, line 63, compiled Feb 10 2014 at 15:44:30 ***
Assertion 'global_node_idx < to_uint(nemhelper->num_nodes_global)' failed.
global_node_idx = 4294967295
to_uint(nemhelper->num_nodes_global) = 131
[0] ../src/mesh/nemesis_io.C, line 431, compiled Feb 10 2014 at 15:44:30
Assertion 'sum_internal_elems+sum_border_elems == nemhelper->num_elems_global' failed.
sum_internal_elems+sum_border_elems = 22
nemhelper->num_elems_global = 24
[3] ../src/mesh/nemesis_io.C, line 726, compiled Feb 10 2014 at 15:44:30
Wait, our Nemesis reader doesn't understand adapted meshes? I didn't know that. We should probably toss a libmesh_not_implemented() in there somewhere appropriate.
Our Exodus reader handles adapted meshes, right?
Anyway, thanks for the help; sorry this was a red herring. I'll get back to the regression myself.
Neither Exodus nor Nemesis can read adapted meshes. There is no such thing in Exodus land. There is no way to store a "family" or "tree of elements" in Exodus/Nemesis at all.
When we write out adapted Exodus/Nemesis we're just writing the active elements....
Gah. This puts a crimp in my "let's just make our next restart file format be Exodus with some HDF5 extensions" dream.
Yeah, I've had some thoughts about augmenting those formats with an index representation of the tree...
So I'm a little confused about what bug you are seeing at this point.
I'm getting a convergence failure when running adjoints_ex3 with ParallelMesh on 4 processors.
If you want to take a crack at replicating and figuring it out, I certainly wouldn't mind, but at this point I'd be fine seeing 0.9.3-final released just as soon as Derek's global_foo() is backported.
I wasn’t thinking we need to backport that for 0.9.3 as the default communicator is still active - am I off base on that?
Basically I want to make it possible to write "forwards-compatible" software against a release as early as possible. Same argument for why Paul and I backported the DiffContext/FEMContext accessors.
A noble goal indeed... That pull request looks pretty good to me.
If you want to do the pull and backport, I think Derek's stuff is ready.
I've narrowed down the adjoints_ex3 problem slightly - somehow we're missing a single DoF constraint in the ParallelMesh case.
Now I've found the adjoints_ex3 problem and I'm testing a fix.
This is probably worth delaying the release for; it's a real corner case, but it could affect any app with ParallelMesh, mixed finite elements, hanging nodes, and enough bad luck.
On Thu, Feb 13, 2014 at 11:52 AM, roystgnr notifications@github.com wrote:
This is probably worth delaying the release for; it's a real corner case, but it could affect any app with ParallelMesh, mixed finite elements, hanging nodes, and enough bad luck.
Yikes. Sounds like a good thing to delay a release for. How involved is the fix?
John
I'm totally fine holding off for this, I'll cherry-pick Derek's recent PR later tomorrow or his weekend - I'm about to get off an airplane and head up to Steamboat...
On Feb 13, 2014, at 11:54 AM, "jwpeterson" notifications@github.com<mailto:notifications@github.com> wrote:
On Thu, Feb 13, 2014 at 11:52 AM, roystgnr notifications@github.com<mailto:notifications@github.com> wrote:
This is probably worth delaying the release for; it's a real corner case, but it could affect any app with ParallelMesh, mixed finite elements, hanging nodes, and enough bad luck.
Yikes. Sounds like a good thing to delay a release for. How involved is the fix?
John
— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-35012129.
I'll do the last cherry-picking; hopefully then we can run it through regression tests tomorrow morning and do the release tomorrow afternoon. Enjoy your trip.
Thanks, I especially want to ensure that both with and without default communicator options work as expected.
On Feb 13, 2014, at 1:02 PM, "roystgnr" notifications@github.com<mailto:notifications@github.com> wrote:
I'll do the last cherry-picking; hopefully then we can run it through regression tests tomorrow morning and do the release tomorrow afternoon. Enjoy your trip.
— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-35019417.
We're passing all the GRIN-S and internal tests, and John tells me we're passing all the MOOSE tests. Ready to release when you are.
Beautiful, let's update the NEWS to describe the bugfix you found and Derek's new global functions, then we're good!
On Feb 14, 2014, at 3:40 PM, "roystgnr" notifications@github.com<mailto:notifications@github.com> wrote:
We're passing all the GRIN-S and internal tests, and John tells me we're passing all the MOOSE tests. Ready to release when you are.
— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-35132246.
@roystgnr, could you help me confirm the commits that need to be merged into v0.9.3?
eeb26206190a96070c32dafed79e1907a2220d3e b1b891f37e6f50fd28dee4ab5818c0f66d1c5dcc
Is that it?