Hook up AMR object for initial mesh refinement

Goals, requirements

The goal is to hook up the existing AMR object under Inciter/AMR/ in inciter's Partitioner in a way that optional mesh refinement happens before initial mesh partitioning and thus reordering (and everything else) follows on the optionally refined and optimally-distributed mesh.

The current initial mesh refinement is very limited and has the following drawbacks:

does not use the AMR object (which is capable of arbitrary refinement),
can only do a single-level refinement, i.e., 1->8,
can only do uniform mesh refinement (i.e., halving all edges of the entire mesh), and
after initial (static) mesh partitioning, which yields sub-optimal partitions.

All these limitations should and will be eliminated by hooking up the AMR object which enables multiple levels of refinement, non-uniform, e.g., initial-conditions-based refinement, and before mesh partitioning, which will be a lot more useful, and will also produce better-balanced partitions.

Desired features

There are several desired features due to the various combinations of how the initial mesh refinement could be used:

with a graph-based or with a coordinate-based partitioner (G or C),
with single or with multiple levels of initial refinement (S or M),
uniform and/or non-uniform refinement (U or N),

of which all combinations are useful and so should work. Obviously, all of this should work in parallel. Depending on the combination, potentially different code-paths should be implemented that are optimal for the given path.

Example use cases

GxU (where x is either S or M) do not require reading the mesh node coordinates for the original mesh from file for any of the steps (refinement, partitioning, reordering, etc.) in most of Partitioner, only at the end when workers (e.g., DiagCG, DG) are created which will take coordinates for their nodes of the mesh chunk they will work on. This means, reading of the (original) node coordinates should only be done right before passing them to workers and according to the reordered mesh nodes. This will not require communicating node coordinates due to reordering.
- GxN requires node coordinates for (non-uniform) refinement because we need to compute errors based on the initial conditions which require gradients and thus coordinates, but not for partitioning.
- CxU (where x is either S or M) requires node coordinates for (coordinate-based) partitioning but not for (uniform) refinement.
- CxN needs node coordinates for both partitioning and refinement.

Parallel implementation

Distributed-memory parallel implementation will require some way of ensuring that the newly added nodes by the refinement step are unique along boundaries shared among potentially multiple PEs.

Note that uniqueness of node IDs is not necessarily required for geometric (coordinate-based) partitioning as we pass cell centroids to Zoltan, which thus does not even require node IDs. However, globally unique node IDs are required for pretty much everything else we do in Partitioner and later. Also, for graph-based partitioning, which takes node IDs, uniqueness is also required.

To ensure globally unique node IDs a communication step will be required after refinement is done by the AMR object on the single mesh chunk held by a PE. To make sure the same (newly added) physical node gets the same globally unique ID on all PEs some kind of "matching" needs to be implemented that uniquely identifies the physical node. One way of doing this matching is checking node coordinates (up to some floating point precision), coordinate-based matching. Another one doing the matching can be based on node IDs via associating newly added nodes to existing edges (given by its two end-point node IDs), node-based matching. Node-based matching appears to be preferable to the coordinate-based one because it potentially involves less data: an edge, given by 2 nodes, associated to a single node ID (3x64 bit) vs. three node coordinates associated to a single node ID (4x64 bit). Single-level refinement could definitely be done with both node-, and coordinate-based matching, however, node-based matching for multiple-level refinement is less clear how it would work.

Design details, questions

Geometry-, or coordinate-based partitioning, denoted by C above, requires coordinates of the mesh nodes or cells (centroids). Currently, we use cell centroids. This requires reading in the node coordinates before partitioning. However, node reordering assigns elements and nodes to PEs that are different than the ones used when the nodes were originally read in.

Options/questions:
1. Should we throw away the node coordinates read in before partitioning and reordering followed by reading in the nodes once again on their new PE or,
2. Should we communicate the node coordinates to their new PEs once the new ordering is available?
Which one is more efficient at large scales? Can this be benchmarked somehow before a decision is made? Should both be possible? Which one would lead to simpler (and thus more maintainable) code?
Uniform and non-uniform initial refinement could potentially be useful when both applied, i.e., one or more levels of uniform refinement followed by a non-uniform refinement step based on errors in the initial conditions specified.
PE-boundary node matching for single level refinement: For matching newly added nodes on PE boundaries, most likely both the coordinate-, as well as the node-based approach can work, discussed above. Node-based matching associates a node ID to an edge for a single refinement step (3x64 bit), while coordinate-based matching associates a node ID to three coordinates (4x64 bit) for a single refinement step, i.e., halving a single edge on the boundary.
PE-boundary node matching for multiple level refinement: Newly added node IDs in case of multiple levels of refinement on a single PE-boundary edge (let's denote these "edge-nodes" for short) could be represented by the edge (given by the original node end-points) and associated multiple newly added node IDs. This appears to be enough to match multiple newly added nodes on a single edge. However, in case of multiple levels of refinement, there are also newly added nodes on PE-boundary faces (instead of only edges, let's call these nodes on faces "face-nodes"). What is the best way (e.g., using what data structure) to communicate these newly added nodes on PE-boundary faces?

Options/questions:
1. Should we simply associate nodes to faces (and nodes to edges) using maps, or
2. Should we send more information from which parent-child relationship can also be recovered? Do we need such parent-child relationship on the receiving PE? If so, do we have to send it or can it be reconstructed from smaller data? Or,
3. Should we, instead, send the information that led to the refinement adjacent to the PE-boundary? If so, what is this information based on which the refinement adjacent to the PE-boundary can be recovered that connects to the other side producing a correctly connected conforming mesh? I think, this information is the refine/coarsen decision on all edges of the tetrahedron elements adjacent to the PE boundary. Based on this the correct refinement and coarsening can be applied and propagated away from the boundary towards the inside of the subdomain.
Obviously, these questions regarding the boundary communication also apply to refinement (and coarsening) during time stepping.

Note that while continuous Galerkin (DiagCG, MatCG) discretization needs only the PE/chare-boundary information at boundary nodes, discontinuous Galerkin (or any finite volume) discretization needs the node, face, and volume element information adjacent to chare/PE-boundaries as well.

Separate reading of mesh graph and node coordinates

Is it worth considering reading the mesh graph and the node coordinates separately for Exodus files? How much difference does this make at large scales? Is it significant? How does this affect our strategy with a setup with optional initial mesh refinement?

Legend

B: needs node coordinates for partitioning R: needs node coordinates for refinement A: needs node coordinates only after reordering

(In any case we definitely need node coordinates after reordering.)

All combinations with initial mesh reifnement

Using the notations introduced above under Desired features we have:

CSU: B CSN: B R CMU: B CMN: B R

GSU: A GSN: R GMU: A GMN: R

Note that the need for node coordinates for B and R are very close to each other in the code, in Partitioner's constructor, and these are the coordinates for the same list of global node IDs on a given PE (versus different ones after reordering). Therefore these two can be considered to be practically the same, so B and R could also be considered the same.

If we consider B and R the same, out of the 8 possible combinations listed above, only 2 are different from the viewpoint of when they require node coordinates: GSU and GMU. Independent of the decision whether the coordinates for the other 6 cases will be communication or read again after reordering, GSU and GMU may lead to a faster setup compared to the other 6, especially for large meshes and on a large number of PEs. Note that we do not yet have graph-based mesh partitioners hooked up from Zoltan2.

Another question to answer here is how Omega_h's osh format (see #211) will play into this picture. For example, is it possible to not have to read the node coordinates for an osh mesh only the graph and read them only later, as is currently possible from Exodus meshes? If separate graph and node coordinates read is not possible from osh, how much effort would it be to do it?

Back-of-an-envelope estimation of the cost of reading the node coordinates

Below we denote the number of elements as nelem, and the number of nodes in a mesh as npoin. Then we have

Approximately nelem = 5.5 * npoin (see Ch.10 of Lohner, Applied CFD Techniques)
Storage for one element: 4 * 8 bytes
Storage for one point: 3 * 8 bytes

With the above

Cost of reading a nelem elements graph: nelem 4 8 bytes = 5.5 4 8 npoin = 176 npoin
Cost of reading npoin node coordinates: npoin 3 8 bytes = 3 8 npoin = 24 * npoin

The above means that reading the mesh element graph (connectivity) costs approximately 7.3x more than reading the node coordinates. Exodus files, however, store the element IDs as 4-byte and not 8-byte integers. This changes the above picture as

Cost of reading a nelem elements graph: nelem 4 4 bytes = 5.5 4 4 npoin = 88 npoin

This means that reading the connectivity still costs 3.7x more than reading the node coordinates. Is it even worth the code complexity of delaying reading the node coordinates until after the reordering step when only 2 out of 8 combinations can benefit and the benefit is only approximately 25% of the total cost of reading the mesh (graph + coordinates)?

quinoacomputing / quinoa