Synchronous mesh operations across meshes

lindsayad commented 4 years ago

Copying most of this from idaholab/moose#15660:

Reason

In MOOSE: we do a whole lot of gymnastics to keep reference-displaced mesh and equation-system objects in sync, and the whole process is quite brittle as @fdkong is discovering. For example, take this fairly straightforward case: We are using a DistributedMesh and then we perform exodus output. Guess what happens? We serialize the reference mesh, while the displaced mesh stays distributed. Now one can argue that after serialization and exodus output, the reference mesh should be re-distributed, and that's a valid argument, but this just goes to show how easy it is for the objects to get out of sync.

Design

With @friedmud, @fdkong, and @roystgnr we've brainstormed a few ideas. Below are some. We can substitute ArbitraryNumber for Two (and add all the corresponding abstraction) if we think of a good reason.

TwoMeshMesh. This has the drawback that often perform refinement with single Elem objects so we'd have to hook back to the Mesh somehow in order to keep the other Elem in the pair sync'd
TwoElemElem
Node inherits from MultiPoint which composites two Point objects. Anytime an operation is done on a Node you are essentially doing an operation on both the reference and displaced node.

Impact

Eliminate the need for users to manually keep reference-displaced Mesh and EquationSystem objects in sync, which can be quite error-prone. If we were able to implement the proposed design, a user could perform adaptivity, or do any other operation, once with a single object and not worry about any manual synchronization.

jwpeterson commented 4 years ago

Is the idea that TwoMeshMesh inherits from Distributed/ReplicatedMesh and performs the same actions (with some exceptions) on both underlying meshes? For instance, if you write out the TwoMeshMesh, you'd only want to write out the "reference" mesh but not the displaced one (or vice-versa).

Seems also related to the Observer design pattern maybe but that is primarily for the case where you don't want/need to know all the other objects that are "subscribed" to your changes.

often perform refinement with single Elem objects

Hmm... refinement with no MeshRefinement object involved? That would definitely be tricky to detect, since Elems don't know about the Mesh they belong to...

lindsayad commented 4 years ago

Is the idea that TwoMeshMesh inherits from Distributed/ReplicatedMesh and performs the same actions (with some exceptions) on both underlying meshes? For instance, if you write out the TwoMeshMesh, you'd only want to write out the "reference" mesh but not the displaced one (or vice-versa).

I think the idea for TwoMeshMesh (which we decided was probably the least likely option to pursue) was to inherit from MeshBase and then hold pointers to two MeshBases. Then a call to TwoMeshMesh::method_x would be routed to mesh_base1.method_x() and mesh_base2.method_x().

Hmm... refinement with no MeshRefinement object involved? That would definitely be tricky to detect, since Elems don't know about the Mesh they belong to...

This is kind of why TwoMeshMesh is our least favorite option.

roystgnr commented 4 years ago

Something from @friedmud that should go here:

sub-app meshes staying in sync with master app meshes

Remember when we were trying to find an example of why you would want more than two coordinate systems? I have it now: multiapps.

So the "TwoMeshMesh" idea would be "MultiMeshMesh", "TwoElemElem" would be "MultiElemElem", etc.

roystgnr commented 4 years ago

We're not going to go with a "one mesh, but you call reposition(displacement_vector) to change its state" idea, because of the ease of leaving something in a bad state in one subroutine and silently corrupting data in another.

@friedmud and I both hit on the idea of constructing displaced Elem objects on the fly as required, similarly to how we deal with Side objects now. The main problem with that is that MOOSE uses a lot of full Mesh methods (finding bounding boxes of the mesh or of subdomains, constructing a MeshFunction, doing ray-tracing on a displaced shape, etc). Operations that iterate over element/node ranges or that query elements/nodes by id would work fine, but operations that rely on neighbor_ptr() to get from Elem to Elem would be out of luck.

We could have a ring of next_shadowing_object() (or displaced, linked, tandem) links on each Elem and Node (and the Mesh itself) to make it easy to sync operations on one object to operations on each linked object.

roystgnr commented 4 years ago

What are our "mesh operations" that need to be kept perfectly synced:

Partitioning
AMR changes - which would ideally be able to go both ways, for use with subapps that want to do AMR but then need their refinement reflected in the master app.
Ghosting

Nobody can think of anything else ATM; if we do we should post it here.

lindsayad commented 4 years ago

I actually have some time to work on this, probably starting tomorrow. I have some mortar, displaced mesh tests, that currently fail in distributed mesh mode, and I'm fairly confident it's because our ProxyRelationshipManager system in MOOSE is hitting a snag. I should be able to give a final diagnosis tomorrow. But this would be a situation where if this ticket was solved, we wouldn't need a ProxyRelationshipManager system. @roystgnr were you serious about wanting to do a point release before anyone even started working on this?

roystgnr commented 4 years ago

Yeah, that was no joke. This is going to cost memory (unless we make it disableable at configure time, which might be a good long-term goal but adds too much complication to want to do in the first pass), it's going to cost at least a little CPU (with the same caveats), and it's going to involve digging into a bunch of subsystems and potentially breaking one for some corner case or another.

I think we'll want to merge #2693 (and I assume #2704) before we branch, but hopefully that's all? I wish my next swath of Nemesis_IO changes was ready for a PR first too, but that won't be this week; maybe early next at best. Pinging @jwpeterson to see if he has anything else to go in first, but I doubt he'll disagree that it's a good time for 1.6.0, even aside from "push a release out before risky changes to master", it's been nearly a year since 1.5.0.

My gnawing concern with the next_shadowing_object() design is that it feels like overkill for those three mesh operations categories - we do all of those in a few fairly self-contained places, all of which have a MeshBase at hand, so if those really were the only things worth syncing then I'd say let's just keep the connections between synched meshes at the MeshBase level, not inserted into every DofObject. But though it's not explicitly on the list above, I'm guessing that @friedmud might have aspirations towards also using the next_shadowed_object() stuff extensively in application code, either because it might be faster than an extra level of indirection or because he wants to use those links in code where you have access to an Elem/Node but not to the MeshBase?

lindsayad commented 4 years ago

Without shadowing at the Elem level, how do you simplify the synchronization of ghosting for example? It seems like you would be relying on having ghosting functors that act identically on the linked meshes, or have some proxy that tries to reproduce identical ghosting based on equivalent element ids or unique ids (this is what we’re trying to do now) from mesh to mesh. I guess with perfectly synchronous mesh level operations, the latter technique should be safe....

On Sep 15, 2020, at 6:59 PM, roystgnr notifications@github.com wrote:

Yeah, that was no joke. This is going to cost memory (unless we make it disableable at configure time, which might be a good long-term goal but adds too much complication to want to do in the first pass), it's going to cost at least a little CPU (with the same caveats), and it's going to involve digging into a bunch of subsystems and potentially breaking one for some corner case or another.

I think we'll want to merge #2693 (and I assume #2704) before we branch, but hopefully that's all? I wish my next swath of Nemesis_IO changes was ready for a PR first too, but that won't be this week; maybe early next at best. Pinging @jwpeterson to see if he has anything else to go in first, but I doubt he'll disagree that it's a good time for 1.6.0, even aside from "push a release out before risky changes to master", it's been nearly a year since 1.5.0.

My gnawing concern with the next_shadowing_object() design is that it feels like overkill for those three mesh operations categories - we do all of those in a few fairly self-contained places, all of which have a MeshBase at hand, so if those really were the only things worth syncing then I'd say let's just keep the connections between synched meshes at the MeshBase level, not inserted into every DofObject. But though it's not explicitly on the list above, I'm guessing that @friedmud might have aspirations towards also using the next_shadowed_object() stuff extensively in application code, either because it might be faster than an extra level of indirection or because he wants to use those links in code where you have access to an Elem/Node but not to the MeshBase?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fdkong commented 4 years ago

Without shadowing at the Elem level, how do you simplify the synchronization of ghosting for example?

We need to sync the node level as well. For example, I would like to see node PIDs are consistent across meshes

lindsayad commented 4 years ago

By extension, we also want to be adding the ability to synchronize DofMap operations

On Sep 15, 2020, at 9:38 PM, Fande Kong notifications@github.com wrote:

Without shadowing at the Elem level, how do you simplify the synchronization of ghosting for example?

We need to sync the node level as well. For example, I would like to see node PIDs are consistent across meshes

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jwpeterson commented 4 years ago

Pinging @jwpeterson to see if he has anything else to go in first, but I doubt he'll disagree that it's a good time for 1.6.0, even aside from "push a release out before risky changes to master", it's been nearly a year since 1.5.0.

Yeah, I agree it would be good to make a 1.6.0 release soon. I don't have anything major that I want to merge before a new release, and I agree that it would be good to merge at least #2693 and #2697 before starting a release branch.

roystgnr commented 4 years ago

Without shadowing at the Elem level, how do you simplify the synchronization of ghosting for example?

Every function doing ghosting has (or at least could be easily given) a MeshBase to work with. And once you have a mesh, wherever you'd want elem->shadowed(), you could use mesh->shadowed()->elem_ptr(elem->id()). It'd be slightly slower (I really need to improve the DistributedMesh lookup one of these days...) but not bloating struct sizes would make everything else slightly faster so I'm not sure a code which worked that way would be slower overall.

roystgnr commented 4 years ago

By extension, we also want to be adding the ability to synchronize DofMap operations

Which ones, and how? Does every variable on every system on the primary mesh have to have a corresponding variable on a corresponding system on every secondary mesh? Isn't that counter to the multiapp use case?

lindsayad commented 4 years ago

You wouldn’t want it in the multiapp case, but you may for the ref/displaced. Eg an algebraic ghosting functor for the displaced mesh may indicate you need your vectors ghosted while you may not even have an algebraic ghosting functor for the reference (I’m thinking about mortar mechanical contact). I’m going to have to double check our assembly process for that to be sure though.

On Sep 16, 2020, at 7:09 AM, roystgnr notifications@github.com wrote:

By extension, we also want to be adding the ability to synchronize DofMap operations

Which ones, and how? Does every variable on every system on the primary mesh have to have a corresponding variable on a corresponding system on every secondary mesh? Isn't that counter to the multiapp use case?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fdkong commented 4 years ago

You wouldn’t want it in the multiapp case, but you may for the ref/displaced. Eg an algebraic ghosting functor for the displaced mesh may indicate you need your vectors ghosted while you may not even have an algebraic ghosting functor for the reference (I’m thinking about mortar mechanical contact). I’m going to have to double check our assembly process for that to be sure though.

I want to put it in this way. In general, @roystgnr is right that we do not have a one-to-one variable map between one mesh and another mesh. But if I had a single variable A in mesh A, and had a single variable B on mesh B, then I would expect that the numeric vectors allocated on mesh A and B are identical. We assume that variables A and B are the same order finite element variable.

roystgnr commented 4 years ago

If by "identical" you meant in the same sense that a parallel and a ghosted version of the same vector could be "identical", then that would be guaranteed just by keeping the node and elem numbering identical.

The trick is going to be keeping the level of algebraic ghosting identical. Even a MeshBase::shadowed() connection wouldn't be enough to let us do that! A single mesh can have multiple Systems attached, each of which has a different set of variables and algebraic ghosting functors. We'll need a System::shadowed() too if we want to be able to let users specify that; there's just too many ways that trying to automatically infer it could go wrong.

lindsayad commented 4 years ago

The trick is going to be keeping the level of algebraic ghosting identical. Even a MeshBase::shadowed() connection wouldn't be enough to let us do that! A single mesh can have multiple Systems attached, each of which has a different set of variables and algebraic ghosting functors. We'll need a System::shadowed() too if we want to be able to let users specify that; there's just too many ways that trying to automatically infer it could go wrong.

A DofMap::shadowed() wouldn't be sufficient? It seems like that's where we handle the algebraic effects of mesh refinement and where we handle the algebraic ghosting.

roystgnr commented 4 years ago

Oh, yeah DofMap::shadowed() would certainly be enough. My point is just that MeshBase::shadowed() (boy, we ought to think of a better API name before this inadequate placeholder wins out by default...) alone wouldn't be enough.

jwpeterson commented 4 years ago

(boy, we ought to think of a better API name before this inadequate placeholder wins out by default...) next_shadowing_object() (or displaced, linked, tandem)

My favorite of these would be "linked" FWIW.

lindsayad commented 4 years ago

My favorite of these would be "linked" FWIW.

I also like that best, although we do already have a lot of links terminology in elem.h

fdkong commented 2 years ago

AMR changes - which would ideally be able to go both ways, for use with subapps that want to do AMR but then need their refinement reflected in the master app.

This is what @dewenyushu are looking for

libMesh / libmesh