Open keileg opened 3 months ago
A perhaps better approach to design is to identify parts of the framework that should be covered by the framework and use this to define the cases. My suggestion would be:
These can be implemented as follows:
Additional thoughts regarding setup etc. (partly notes to self):
What is the advantage(s) of having a separate repo?
What is the advantage(s) of having a separate repo?
Cleanliness. But I see we can achieve the same with a somewhat carefully structured application setup.
Additional thoughts after discussion in person:
The next step is to set up a full example of a benchmark, likely along the lines implementation step 1 above
Me and @pschultzendorff could not make Scalene work expectedly. Instead, we found a solution based on cProfiler and SnakeViz: https://kirillstrelkov.medium.com/easy-python-profiling-a70cbf699295
The solution provides the following ui for the CPU profiling:
An even better approach that we found:
pip install viztracer
viztracer --min_duration 10ms --ignore_c_function --ignore_frozen --max_stack_depth 20 run_benchmarks.py
vizviewer --port 9002 result.json```
To expand on Yury’s approach: In vizviewer
, the standard view displays all method calls in hierarchical order as they occur during runtime. To find the methods with the longest runtime overall, select a timeframe by clicking and dragging in the timeline. This reveals two new views below: a table listing methods sortable by summed, max, min, or average runtime (this has a different measure, likely CPU cycles) and a graph showing hierarchical method calls ordered by total runtime.
TODO
Nice work, I'm looking forward to see where this is going and to take it into use in in maintenance work planned for the coming weeks/months.
Looking at the original specification (under EDIT), it seems the first three points are well under way. These are also the points that are most useful for benchmarking during maintenance, thus prioritizing this makes sense. When we have reached a satisfactory stage on those, my thinking right now (may change) is to have a look at the fourth item (systematic benchmarking over time) and see if something simple and useful can be done there as well.
It seems clear, though, that the full issue must be addressed in stages, so let's try to keep in mind that at some point we should put this to rest for a while and get back to it after having gained some experience by using the functionality.
An argument against performance tracking with github actions: We have neither a guarantee that github consistently employs the same resources, nor can we find out which resources it employs. In fact, this blog post shows that CPU time of simple benchmarks can vary by a factor of 3.
Together with Yury's remarks on the difficulty of coding such an action, I think it's therefore best if we focus on a local cron job. This should be rather straightforward, and we just have to decide where to save the results and on which machine to run the job.
Define and implement a few benchmark models that can be used to measure (improvements in) model assembly time.
Suggested criteria:
Other considerations to be made:
line_profiler
or an external tool like https://github.com/plasma-umass/scalene?EDIT While there are nuances in how best to measure various aspects related to multiphysics, it seems clear we want a benchmark mainly dedicated to geometric complexity, keeping the physics simple (that is, mass balance only). The specification of this first step is roughly as follows (some critical thinking and interpretation should be applied):
flow_benchmarks,
and more may be available throughfracture_sets.py
. Note that there are non-trivial aspects of the geometry and boundary conditions for some of the tutorials, EK can give more information (and point to gmsh files that partly resolve these issues).