quinoacomputing / quinoa

Adaptive computational fluid dynamics
https://quinoacomputing.github.io
Other
102 stars 21 forks source link

Hook up AMR object for initial mesh refinement #220

Closed jbakosi closed 6 years ago

jbakosi commented 6 years ago

Goals, requirements

The goal is to hook up the existing AMR object under Inciter/AMR/ in inciter's Partitioner in a way that optional mesh refinement happens before initial mesh partitioning and thus reordering (and everything else) follows on the optionally refined and optimally-distributed mesh.

The current initial mesh refinement is very limited and has the following drawbacks:

All these limitations should and will be eliminated by hooking up the AMR object which enables multiple levels of refinement, non-uniform, e.g., initial-conditions-based refinement, and before mesh partitioning, which will be a lot more useful, and will also produce better-balanced partitions.

Desired features

There are several desired features due to the various combinations of how the initial mesh refinement could be used:

  1. with a graph-based or with a coordinate-based partitioner (G or C),
  2. with single or with multiple levels of initial refinement (S or M),
  3. uniform and/or non-uniform refinement (U or N),

of which all combinations are useful and so should work. Obviously, all of this should work in parallel. Depending on the combination, potentially different code-paths should be implemented that are optimal for the given path.

Example use cases

Parallel implementation

Distributed-memory parallel implementation will require some way of ensuring that the newly added nodes by the refinement step are unique along boundaries shared among potentially multiple PEs.

Note that uniqueness of node IDs is not necessarily required for geometric (coordinate-based) partitioning as we pass cell centroids to Zoltan, which thus does not even require node IDs. However, globally unique node IDs are required for pretty much everything else we do in Partitioner and later. Also, for graph-based partitioning, which takes node IDs, uniqueness is also required.

To ensure globally unique node IDs a communication step will be required after refinement is done by the AMR object on the single mesh chunk held by a PE. To make sure the same (newly added) physical node gets the same globally unique ID on all PEs some kind of "matching" needs to be implemented that uniquely identifies the physical node. One way of doing this matching is checking node coordinates (up to some floating point precision), coordinate-based matching. Another one doing the matching can be based on node IDs via associating newly added nodes to existing edges (given by its two end-point node IDs), node-based matching. Node-based matching appears to be preferable to the coordinate-based one because it potentially involves less data: an edge, given by 2 nodes, associated to a single node ID (3x64 bit) vs. three node coordinates associated to a single node ID (4x64 bit). Single-level refinement could definitely be done with both node-, and coordinate-based matching, however, node-based matching for multiple-level refinement is less clear how it would work.

jbakosi commented 6 years ago

Design details, questions

Separate reading of mesh graph and node coordinates

Is it worth considering reading the mesh graph and the node coordinates separately for Exodus files? How much difference does this make at large scales? Is it significant? How does this affect our strategy with a setup with optional initial mesh refinement?

Legend

B: needs node coordinates for partitioning R: needs node coordinates for refinement A: needs node coordinates only after reordering

(In any case we definitely need node coordinates after reordering.)

All combinations with initial mesh reifnement

Using the notations introduced above under Desired features we have:

CSU: B CSN: B R CMU: B CMN: B R

GSU: A GSN: R GMU: A GMN: R

Note that the need for node coordinates for B and R are very close to each other in the code, in Partitioner's constructor, and these are the coordinates for the same list of global node IDs on a given PE (versus different ones after reordering). Therefore these two can be considered to be practically the same, so B and R could also be considered the same.

If we consider B and R the same, out of the 8 possible combinations listed above, only 2 are different from the viewpoint of when they require node coordinates: GSU and GMU. Independent of the decision whether the coordinates for the other 6 cases will be communication or read again after reordering, GSU and GMU may lead to a faster setup compared to the other 6, especially for large meshes and on a large number of PEs. Note that we do not yet have graph-based mesh partitioners hooked up from Zoltan2.

Another question to answer here is how Omega_h's osh format (see #211) will play into this picture. For example, is it possible to not have to read the node coordinates for an osh mesh only the graph and read them only later, as is currently possible from Exodus meshes? If separate graph and node coordinates read is not possible from osh, how much effort would it be to do it?

Back-of-an-envelope estimation of the cost of reading the node coordinates

Below we denote the number of elements as nelem, and the number of nodes in a mesh as npoin. Then we have

With the above

The above means that reading the mesh element graph (connectivity) costs approximately 7.3x more than reading the node coordinates. Exodus files, however, store the element IDs as 4-byte and not 8-byte integers. This changes the above picture as

This means that reading the connectivity still costs 3.7x more than reading the node coordinates. Is it even worth the code complexity of delaying reading the node coordinates until after the reordering step when only 2 out of 8 combinations can benefit and the benefit is only approximately 25% of the total cost of reading the mesh (graph + coordinates)?

jbakosi commented 6 years ago

Done, see https://github.com/quinoacomputing/quinoa/pull/262.