Closed rocky closed 2 years ago
From what I have understood each function would need to be benchmarked manually.
functions_list = ["N", "Plot3D", "Times", ...]
times_list = [100, 10, 200, ...]
Yes but I suspect it would be good to form groups of builtin Functions that are roughly comparable. For example the calculator-like functions Plus
, Minus
, Times
so that we get a sense of the relative cost of things that are roughly comparable, and this can guide us and users when implementing something to understand the cost of the primitive that is getting used.
So as a side thing, we may publish a table of relative costs of various built in functions.
Note that although it is well known that in machine code multiplication is much slower than addition, from the Mathics standpoint we want to group "Plus" and "Times" together and separate from functions like "N", "Plot3D" to focus on the difference in speed between "Plus" and "Times" and those as a group from "Plot3D" which I suspect will be an order of magnitude (at least) slower.
I also suspect one can get a rough sense of what functions are comparable or should be grouped together by looking at the Wolfram Mathematics "Guide" sections such as https://reference.wolfram.com/language/guide/ArithmeticFunctions.html
OK, first question: what are we trying to compare? The master branch against previous versions? The master branch against certain branches with PRs? Then, is there a way to have these versions "precompiled", to avoid consuming the CI time in the installation step?
Once we have this, the first thing I would like to benchmark is the time and memory that requires
These aspects impose the largest performance handicap to the core. Specific functions (like Plus
and Times
, or Integrate
depend strongly on that, more than in the ideal complexity of the low-level task (two cycles for adding, four for multiplying).
OK, first question: what are we trying to compare? The master branch against previous versions? The master branch against certain branches with PRs?
Yes - the master branch versus PRs, especially the ones that purport to make things faster.
And in those benchmarks, it is okay to add more benchmarks to highlight the specific kinds of improvements. The benchmarks however may be scrutinized on their own merit for validity. For example suppose I add a custom rule for 7 * 8 = 42. I may get some sort of improvement here, but the likelihood it comes up is rare.
Insight is also an important aspect of benchmarking. So any kind of benchmarks that gives insight are also important.
Then, is there a way to have these versions "precompiled", to avoid consuming the CI time in the installation step?
I've looked into this and the short answer is no. (This kind of thing as is not a problem specific to Mathics, it is something that others have looked at outside for Python and have not found a magic bullet.)
For now, the simplest thing for you to do here is run using pyston 2.3.
At some later date we'll address the long startup-time loading problem. But that is a separate problem.
Once we have this, the first thing I would like to benchmark is the time and memory that requires
* apply different kinds of replacement rules * Determine pattern matching * Assignment of large lists * recursive function calls. * Converting between Mathics expressions to sympy back and forth.
Great - create all and any benchmarks you want.
These aspects impose the largest performance handicap to the core. Specific functions (like
Plus
andTimes
, orIntegrate
depend strongly on that, more than in the ideal complexity of the low-level task (two cycles for adding, four for multiplying).
(The difference between arbitrary addition and multiplication in MMA system is not double but more like a low-order polynomial but this is missing the point. The point was to be able to understand the cost of things which includes the different kinds of calls)
There are many areas to attack the problem. If you want to follow one path and someone else wants to follow another, that's okay.
One of the facets I am exploring is avoiding pattern matching in function call altogether by caching the method that was found on the last call. Initially I am contemplating only for things inside the Python code, but depending on how that goes it can be expanded to other things.
Before and as we improve performance, it is crucial to have a way to measure what is going on, how bad things are, and where we need to improve most.
Key to this is having an understanding of the cost of each of the thousand or so primitives.
At first I thought, hey we could hook this into the doctests we already have since those iterate over each primitive. (Note that the doctest portion of Mathics core is slated to get put in its own repository.)
But this is a bad idea because the focus of doctests is documentation examples, not tests. .And benchmark tests are not generic tests either. And a big mistake we've seen over and over is trying to cram everything into a single piece of code or code base.
So rather, I propose an entirely new repository for this. And the benchmark test could be of several levels.
For example, I understand MMA uses a program called
Benchmarking.m
.But to start out with, at the lowest level we should benchmark individual Built-in functions for each of the kinds of signatures and patterns that each has.
Some built in functions are very slow like some of the Plotting routines. So as part of the test data (the expression to check) we probably want to also store the number of times for that to run the expression. For built in functions like
Times
, this benchmark iteration number will be high, while for operations likePlot3D
(which in fact is probably calling a number of other builtins) this number will be low.In fact it might be useful to separate just the the really primitive built in functions, e.g.
Times
from the complex, compound (and slow) ones, e.g.Plot3D
.Where we save data from runs is a whole issue in of itself.
Also having programs to graph the performance over time is topic for discussion in of itself.