Closed jbakosi closed 6 years ago
Geometry-, or coordinate-based partitioning, denoted by C
above, requires coordinates of the mesh nodes or cells (centroids). Currently, we use cell centroids. This requires reading in the node coordinates before partitioning. However, node reordering assigns elements and nodes to PEs that are different than the ones used when the nodes were originally read in.
Options/questions:
Which one is more efficient at large scales? Can this be benchmarked somehow before a decision is made? Should both be possible? Which one would lead to simpler (and thus more maintainable) code?
Uniform and non-uniform initial refinement could potentially be useful when both applied, i.e., one or more levels of uniform refinement followed by a non-uniform refinement step based on errors in the initial conditions specified.
PE-boundary node matching for single level refinement: For matching newly added nodes on PE boundaries, most likely both the coordinate-, as well as the node-based approach can work, discussed above. Node-based matching associates a node ID to an edge for a single refinement step (3x64 bit), while coordinate-based matching associates a node ID to three coordinates (4x64 bit) for a single refinement step, i.e., halving a single edge on the boundary.
PE-boundary node matching for multiple level refinement: Newly added node IDs in case of multiple levels of refinement on a single PE-boundary edge (let's denote these "edge-nodes" for short) could be represented by the edge (given by the original node end-points) and associated multiple newly added node IDs. This appears to be enough to match multiple newly added nodes on a single edge. However, in case of multiple levels of refinement, there are also newly added nodes on PE-boundary faces (instead of only edges, let's call these nodes on faces "face-nodes"). What is the best way (e.g., using what data structure) to communicate these newly added nodes on PE-boundary faces?
Options/questions:
Obviously, these questions regarding the boundary communication also apply to refinement (and coarsening) during time stepping.
Note that while continuous Galerkin (DiagCG, MatCG) discretization needs only the PE/chare-boundary information at boundary nodes, discontinuous Galerkin (or any finite volume) discretization needs the node, face, and volume element information adjacent to chare/PE-boundaries as well.
Is it worth considering reading the mesh graph and the node coordinates separately for Exodus files? How much difference does this make at large scales? Is it significant? How does this affect our strategy with a setup with optional initial mesh refinement?
B
: needs node coordinates for partitioning
R
: needs node coordinates for refinement
A
: needs node coordinates only after reordering
(In any case we definitely need node coordinates after reordering.)
Using the notations introduced above under Desired features we have:
CSU
: B
CSN
: B
R
CMU
: B
CMN
: B
R
GSU
: A
GSN
: R
GMU
: A
GMN
: R
Note that the need for node coordinates for B
and R
are very close to each other in the code, in Partitioner's constructor, and these are the coordinates for the same list of global node IDs on a given PE (versus different ones after reordering). Therefore these two can be considered to be practically the same, so B
and R
could also be considered the same.
If we consider B
and R
the same, out of the 8 possible combinations listed above, only 2 are different from the viewpoint of when they require node coordinates: GSU
and GMU
. Independent of the decision whether the coordinates for the other 6 cases will be communication or read again after reordering, GSU
and GMU
may lead to a faster setup compared to the other 6, especially for large meshes and on a large number of PEs. Note that we do not yet have graph-based mesh partitioners hooked up from Zoltan2.
Another question to answer here is how Omega_h's osh format (see #211) will play into this picture. For example, is it possible to not have to read the node coordinates for an osh mesh only the graph and read them only later, as is currently possible from Exodus meshes? If separate graph and node coordinates read is not possible from osh, how much effort would it be to do it?
Below we denote the number of elements as nelem
, and the number of nodes in a mesh as npoin
. Then we have
nelem
= 5.5 * npoin
(see Ch.10 of Lohner, Applied CFD Techniques)With the above
nelem
elements graph: nelem
4 8 bytes = 5.5 4 8 npoin
= 176 npoin
npoin
node coordinates: npoin
3 8 bytes = 3 8 npoin
= 24 * npoin
The above means that reading the mesh element graph (connectivity) costs approximately 7.3x more than reading the node coordinates. Exodus files, however, store the element IDs as 4-byte and not 8-byte integers. This changes the above picture as
nelem
elements graph: nelem
4 4 bytes = 5.5 4 4 npoin
= 88 npoin
This means that reading the connectivity still costs 3.7x more than reading the node coordinates. Is it even worth the code complexity of delaying reading the node coordinates until after the reordering step when only 2 out of 8 combinations can benefit and the benefit is only approximately 25% of the total cost of reading the mesh (graph + coordinates)?
Goals, requirements
The goal is to hook up the existing AMR object under
Inciter/AMR/
in inciter'sPartitioner
in a way that optional mesh refinement happens before initial mesh partitioning and thus reordering (and everything else) follows on the optionally refined and optimally-distributed mesh.The current initial mesh refinement is very limited and has the following drawbacks:
All these limitations should and will be eliminated by hooking up the AMR object which enables multiple levels of refinement, non-uniform, e.g., initial-conditions-based refinement, and before mesh partitioning, which will be a lot more useful, and will also produce better-balanced partitions.
Desired features
There are several desired features due to the various combinations of how the initial mesh refinement could be used:
G
orC
),S
orM
),U
orN
),of which all combinations are useful and so should work. Obviously, all of this should work in parallel. Depending on the combination, potentially different code-paths should be implemented that are optimal for the given path.
Example use cases
GxU
(wherex
is eitherS
orM
) do not require reading the mesh node coordinates for the original mesh from file for any of the steps (refinement, partitioning, reordering, etc.) in most of Partitioner, only at the end when workers (e.g., DiagCG, DG) are created which will take coordinates for their nodes of the mesh chunk they will work on. This means, reading of the (original) node coordinates should only be done right before passing them to workers and according to the reordered mesh nodes. This will not require communicating node coordinates due to reordering.GxN
requires node coordinates for (non-uniform) refinement because we need to compute errors based on the initial conditions which require gradients and thus coordinates, but not for partitioning.CxU
(wherex
is eitherS
orM
) requires node coordinates for (coordinate-based) partitioning but not for (uniform) refinement.CxN
needs node coordinates for both partitioning and refinement.Parallel implementation
Distributed-memory parallel implementation will require some way of ensuring that the newly added nodes by the refinement step are unique along boundaries shared among potentially multiple PEs.
Note that uniqueness of node IDs is not necessarily required for geometric (coordinate-based) partitioning as we pass cell centroids to Zoltan, which thus does not even require node IDs. However, globally unique node IDs are required for pretty much everything else we do in Partitioner and later. Also, for graph-based partitioning, which takes node IDs, uniqueness is also required.
To ensure globally unique node IDs a communication step will be required after refinement is done by the AMR object on the single mesh chunk held by a PE. To make sure the same (newly added) physical node gets the same globally unique ID on all PEs some kind of "matching" needs to be implemented that uniquely identifies the physical node. One way of doing this matching is checking node coordinates (up to some floating point precision), coordinate-based matching. Another one doing the matching can be based on node IDs via associating newly added nodes to existing edges (given by its two end-point node IDs), node-based matching. Node-based matching appears to be preferable to the coordinate-based one because it potentially involves less data: an edge, given by 2 nodes, associated to a single node ID (3x64 bit) vs. three node coordinates associated to a single node ID (4x64 bit). Single-level refinement could definitely be done with both node-, and coordinate-based matching, however, node-based matching for multiple-level refinement is less clear how it would work.