Benchmarks for measuring assembly time

keileg commented 3 months ago

Define and implement a few benchmark models that can be used to measure (improvements in) model assembly time.

Suggested criteria:

There should only be a few (say, 3-4) benchmarks.
The total setup time should be limited to facilitate frequent testing.
There should be a span in geometric complexity (2d and 3d, up to dozens of fractures)
Cover 2-3 sets of physics (ex: fluid flow, HM, possibly THM)
Setting up the models should not be a major effort. Reuse of setups from tests or similar should be feasible.

Other considerations to be made:

Should we place the code in a separate repository? My instinct says yes.
Will we base timing on logging, line_profiler or an external tool like https://github.com/plasma-umass/scalene?

EDIT While there are nuances in how best to measure various aspects related to multiphysics, it seems clear we want a benchmark mainly dedicated to geometric complexity, keeping the physics simple (that is, mass balance only). The specification of this first step is roughly as follows (some critical thinking and interpretation should be applied):

Test cases: It is natural to use setups from 2d and 3d benchmarks for flow, though it is not a goal to cover all the cases. Some of the setups are already available, see tutorial flow_benchmarks, and more may be available through fracture_sets.py. Note that there are non-trivial aspects of the geometry and boundary conditions for some of the tutorials, EK can give more information (and point to gmsh files that partly resolve these issues).
For each geometry, there should be some flexibility in terms of mesh resolution, possibly also other parameters, but this should not be overdone.
(Partly) decoupled from the specific setups is where to put the code, how to structure it to allow for reuse and to fit with there being other benchmarks (I don't know exactly what this means) etc. The expectation is that this should be kept in mind but not optimized prematurely.
It is also of interest to consider solutions for tracking of performance, including monitoring over time. Again, this is something not to be overengineered at this early stage.

keileg commented 3 months ago

A perhaps better approach to design is to identify parts of the framework that should be covered by the framework and use this to define the cases. My suggestion would be:

Operations connected to the md grid structure: Subdomain and mortar projections, matrices from discretizations of elemental differential operators (e.g. divergence and Darcy's law). We may want to cover scaling with the number of subdomains, grid size, and possibly spatial dimension.
Constitutive laws related to fracture deformation, in particular the often deeply nested structure resulting from the more complex laws.
Constitutive laws related to compositional multiphase transport.

These can be implemented as follows:

A single phase flow model, with the following parametrization:
- Number of subdomains: The third and fourth fracture network from the 2d benchmark, the second (structured) case from the 3d benchmark.
- Grid resolution: A few fixed grid parameters for each grid.
- Discretization: Possibly vary between Tpfa or Mpfa, but mainly use tpfa
A (T)HM model with a rich set of constitutive laws enabled. This includes shear dilation, diff-tpfa etc. Variations:
- Possibly 2d or 3d. Only a few fractures in each
- Possibly some variations in grid resolution.
Constitutive laws related to multiphase compositional transport. Rich set of constitutive laws, though it is not clear to me what this entails right now.

keileg commented 3 months ago

Additional thoughts regarding setup etc. (partly notes to self):

We should put the run scripts in a separate repository.
Performance improvements over time can be tracked by storing key data in local files, with relevant plotting or analysis functionality in the benchmark repo. This of course assumes that the hardware etc. stays fixed, but that is up to the person doing benchmarking.

IvarStefansson commented 2 months ago

What is the advantage(s) of having a separate repo?

keileg commented 2 months ago

What is the advantage(s) of having a separate repo?

Cleanliness. But I see we can achieve the same with a somewhat carefully structured application setup.

keileg commented 2 months ago

Additional thoughts after discussion in person:

As a complement to a full physics benchmark, a more fine-grained test of individual constitutive laws could give useful information. For this, the method for identifying constitutive laws could be relevant. Along the same lines, a test of the content in the model geometry could be useful, in particular for #1182.
It could also be relevant to parametrize the depth of the Ad trees to test for scalability along this dimension. This can only to some extent be covered by considering different physical models.

The next step is to set up a full example of a benchmark, likely along the lines implementation step 1 above

Yuriyzabegaev commented 4 weeks ago

Me and @pschultzendorff could not make Scalene work expectedly. Instead, we found a solution based on cProfiler and SnakeViz: https://kirillstrelkov.medium.com/easy-python-profiling-a70cbf699295

The solution provides the following ui for the CPU profiling:

Yuriyzabegaev commented 3 weeks ago

An even better approach that we found:


pip install viztracer
viztracer --min_duration 10ms --ignore_c_function --ignore_frozen --max_stack_depth 20 run_benchmarks.py
vizviewer --port 9002 result.json```

pschultzendorff commented 3 weeks ago

To expand on Yury’s approach: In vizviewer, the standard view displays all method calls in hierarchical order as they occur during runtime. To find the methods with the longest runtime overall, select a timeframe by clicking and dragging in the timeline. This reveals two new views below: a table listing methods sortable by summed, max, min, or average runtime (this has a different measure, likely CPU cycles) and a graph showing hierarchical method calls ordered by total runtime.

pschultzendorff commented 3 weeks ago

TODO

Yury creates a 2D and a 3D poromechanics benchmark
Peter writes a script that runs viztracer and vizviewer on a chosen benchmark and allows some simple user input (model, dimension, grid refinement)

keileg commented 3 weeks ago

Nice work, I'm looking forward to see where this is going and to take it into use in in maintenance work planned for the coming weeks/months.

Looking at the original specification (under EDIT), it seems the first three points are well under way. These are also the points that are most useful for benchmarking during maintenance, thus prioritizing this makes sense. When we have reached a satisfactory stage on those, my thinking right now (may change) is to have a look at the fourth item (systematic benchmarking over time) and see if something simple and useful can be done there as well.

It seems clear, though, that the full issue must be addressed in stages, so let's try to keep in mind that at some point we should put this to rest for a while and get back to it after having gained some experience by using the functionality.

keileg commented 2 weeks ago

An additional fracture network is available here

pschultzendorff commented 2 weeks ago

An argument against performance tracking with github actions: We have neither a guarantee that github consistently employs the same resources, nor can we find out which resources it employs. In fact, this blog post shows that CPU time of simple benchmarks can vary by a factor of 3.

Together with Yury's remarks on the difficulty of coding such an action, I think it's therefore best if we focus on a local cron job. This should be rather straightforward, and we just have to decide where to save the results and on which machine to run the job.

pmgbergen / porepy

Benchmarks for measuring assembly time #1216