libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
654 stars 286 forks source link

v0.9.3-final #197

Closed benkirk closed 10 years ago

benkirk commented 10 years ago

@roystgnr, could you help me confirm the commits that need to be merged into v0.9.3?

eeb26206190a96070c32dafed79e1907a2220d3e b1b891f37e6f50fd28dee4ab5818c0f66d1c5dcc

Is that it?

roystgnr commented 10 years ago

Also 49c4f4f - I ran into that while investigating Derek's new patch, but it seems to be an older problem.

I'm still trying to get a handle on that weird ParallelMesh+adjoints_ex3 failure, though; sorry it's taking so long.

benkirk commented 10 years ago

Thanks - I would have missed that; thinking it was dependent on Derek’s new stuff.

-Ben

On Feb 4, 2014, at 11:46 AM, roystgnr notifications@github.com wrote:

Also 49c4f4f - I ran into that while investigating Derek's new patch, but it seems to be an older problem.

I'm still trying to get a handle on that weird ParallelMesh+adjoints_ex3 failure, though; sorry it's taking so long.

— Reply to this email directly or view it on GitHub.

roystgnr commented 10 years ago

I've managed to at least trigger a lower level of failure now. I can get adjoints_ex3 to generate a Nemesis restart file which, although it looks fine in my Paraview 4.0.1, triggers assertion failures when I load it into libMesh. Can anyone else take a look at http://users.ices.utexas.edu/~roystgnr/badmesh/ ?

benkirk commented 10 years ago

OK, so simply loading this on 4 processors freaks things out?

I'll see if I can do anything there...

roystgnr commented 10 years ago

Just UnstructuredMesh::read() on four processors in dbg or devel mode should trigger the problem. One processor sees a global_node_idx of -1; another processor somehow gets a num_elems_global of 24 instead of 22.

Thanks!

benkirk commented 10 years ago

Has this mesh been refined? If so, I'm not sure what to expect from Nemesis...

screenshot-paraview 4 1 0 64-bit

roystgnr commented 10 years ago

This is the mesh that gets written out after one adaptive refinement step. Your screenshot looks like what I'd expect. Have you tried loading it in a libMesh app yet?

benkirk commented 10 years ago

Not yet, but what I meant is that the libMesh nemesis reader would not understand adapted meshes so I'm not surprised that could cause problems. Still, I'll run it through and see what I can find.

On Feb 10, 2014, at 4:10 PM, "roystgnr" notifications@github.com<mailto:notifications@github.com> wrote:

This is the mesh that gets written out after one adaptive refinement step. Your screenshot looks like what I'd expect. Have you tried loading it in a libMesh app yet?

— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-34688909.

benkirk commented 10 years ago

Yeah, the error I see is

*** Warning, This code is untested, experimental, or likely to see future API changes: ../src/mesh/nemesis_io_helper.C, line 63, compiled Feb 10 2014 at 15:44:30 ***
Assertion 'global_node_idx < to_uint(nemhelper->num_nodes_global)' failed.
global_node_idx = 4294967295
to_uint(nemhelper->num_nodes_global) = 131
[0] ../src/mesh/nemesis_io.C, line 431, compiled Feb 10 2014 at 15:44:30
Assertion 'sum_internal_elems+sum_border_elems == nemhelper->num_elems_global' failed.
sum_internal_elems+sum_border_elems = 22
nemhelper->num_elems_global = 24
[3] ../src/mesh/nemesis_io.C, line 726, compiled Feb 10 2014 at 15:44:30
roystgnr commented 10 years ago

Wait, our Nemesis reader doesn't understand adapted meshes? I didn't know that. We should probably toss a libmesh_not_implemented() in there somewhere appropriate.

Our Exodus reader handles adapted meshes, right?

roystgnr commented 10 years ago

Anyway, thanks for the help; sorry this was a red herring. I'll get back to the regression myself.

friedmud commented 10 years ago

Neither Exodus nor Nemesis can read adapted meshes. There is no such thing in Exodus land. There is no way to store a "family" or "tree of elements" in Exodus/Nemesis at all.

When we write out adapted Exodus/Nemesis we're just writing the active elements....

roystgnr commented 10 years ago

Gah. This puts a crimp in my "let's just make our next restart file format be Exodus with some HDF5 extensions" dream.

benkirk commented 10 years ago

Yeah, I've had some thoughts about augmenting those formats with an index representation of the tree...

benkirk commented 10 years ago

So I'm a little confused about what bug you are seeing at this point.

  1. Is there any thign I can do at this point to help?
  2. should we wait on 0.9.3-final or move on with this bug outstanding?
roystgnr commented 10 years ago

I'm getting a convergence failure when running adjoints_ex3 with ParallelMesh on 4 processors.

If you want to take a crack at replicating and figuring it out, I certainly wouldn't mind, but at this point I'd be fine seeing 0.9.3-final released just as soon as Derek's global_foo() is backported.

benkirk commented 10 years ago

I wasn’t thinking we need to backport that for 0.9.3 as the default communicator is still active - am I off base on that?

roystgnr commented 10 years ago

Basically I want to make it possible to write "forwards-compatible" software against a release as early as possible. Same argument for why Paul and I backported the DiffContext/FEMContext accessors.

benkirk commented 10 years ago

A noble goal indeed... That pull request looks pretty good to me.

roystgnr commented 10 years ago

If you want to do the pull and backport, I think Derek's stuff is ready.

I've narrowed down the adjoints_ex3 problem slightly - somehow we're missing a single DoF constraint in the ParallelMesh case.

roystgnr commented 10 years ago

Now I've found the adjoints_ex3 problem and I'm testing a fix.

roystgnr commented 10 years ago

This is probably worth delaying the release for; it's a real corner case, but it could affect any app with ParallelMesh, mixed finite elements, hanging nodes, and enough bad luck.

jwpeterson commented 10 years ago

On Thu, Feb 13, 2014 at 11:52 AM, roystgnr notifications@github.com wrote:

This is probably worth delaying the release for; it's a real corner case, but it could affect any app with ParallelMesh, mixed finite elements, hanging nodes, and enough bad luck.

Yikes. Sounds like a good thing to delay a release for. How involved is the fix?

John

benkirk commented 10 years ago

I'm totally fine holding off for this, I'll cherry-pick Derek's recent PR later tomorrow or his weekend - I'm about to get off an airplane and head up to Steamboat...

On Feb 13, 2014, at 11:54 AM, "jwpeterson" notifications@github.com<mailto:notifications@github.com> wrote:

On Thu, Feb 13, 2014 at 11:52 AM, roystgnr notifications@github.com<mailto:notifications@github.com> wrote:

This is probably worth delaying the release for; it's a real corner case, but it could affect any app with ParallelMesh, mixed finite elements, hanging nodes, and enough bad luck.

Yikes. Sounds like a good thing to delay a release for. How involved is the fix?

John

— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-35012129.

roystgnr commented 10 years ago

I'll do the last cherry-picking; hopefully then we can run it through regression tests tomorrow morning and do the release tomorrow afternoon. Enjoy your trip.

benkirk commented 10 years ago

Thanks, I especially want to ensure that both with and without default communicator options work as expected.

On Feb 13, 2014, at 1:02 PM, "roystgnr" notifications@github.com<mailto:notifications@github.com> wrote:

I'll do the last cherry-picking; hopefully then we can run it through regression tests tomorrow morning and do the release tomorrow afternoon. Enjoy your trip.

— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-35019417.

roystgnr commented 10 years ago

We're passing all the GRIN-S and internal tests, and John tells me we're passing all the MOOSE tests. Ready to release when you are.

benkirk commented 10 years ago

Beautiful, let's update the NEWS to describe the bugfix you found and Derek's new global functions, then we're good!

On Feb 14, 2014, at 3:40 PM, "roystgnr" notifications@github.com<mailto:notifications@github.com> wrote:

We're passing all the GRIN-S and internal tests, and John tells me we're passing all the MOOSE tests. Ready to release when you are.

— Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/197#issuecomment-35132246.