Closed haru-44 closed 1 week ago
:white_check_mark:
Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.
Benchmark results from GitHub Actions
Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with +
are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with -
are speedups.
Significantly changed benchmark results (PR vs master)
Significantly changed benchmark results (master vs previous release)
| Change | Before [2487dbb5] | After [8519e0f7] | Ratio | Benchmark (Parameter) |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------|
| - | 70.8±0.6ms | 44.2±0.2ms | 0.62 | integrate.TimeIntegrationRisch02.time_doit(10) |
| - | 68.1±0.7ms | 43.4±0.4ms | 0.64 | integrate.TimeIntegrationRisch02.time_doit_risch(10) |
| + | 18.7±0.6μs | 29.8±0.6μs | 1.59 | integrate.TimeIntegrationRisch03.time_doit(1) |
| - | 5.41±0.02ms | 2.87±0.01ms | 0.53 | logic.LogicSuite.time_load_file |
| - | 73.0±0.3ms | 28.5±0.1ms | 0.39 | polys.TimeGCD_GaussInt.time_op(1, 'dense') |
| - | 73.0±0.3ms | 28.9±0.2ms | 0.4 | polys.TimeGCD_GaussInt.time_op(1, 'sparse') |
| - | 252±1ms | 124±0.5ms | 0.49 | polys.TimeGCD_GaussInt.time_op(2, 'dense') |
| - | 251±1ms | 125±0.4ms | 0.5 | polys.TimeGCD_GaussInt.time_op(2, 'sparse') |
| - | 643±2ms | 374±3ms | 0.58 | polys.TimeGCD_GaussInt.time_op(3, 'dense') |
| - | 653±2ms | 372±1ms | 0.57 | polys.TimeGCD_GaussInt.time_op(3, 'sparse') |
| - | 496±4μs | 283±3μs | 0.57 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(1, 'dense') |
| - | 1.74±0.01ms | 1.04±0.01ms | 0.6 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(2, 'dense') |
| - | 5.77±0.07ms | 3.07±0.03ms | 0.53 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(3, 'dense') |
| - | 450±10μs | 228±2μs | 0.51 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(1, 'dense') |
| - | 1.48±0.01ms | 680±4μs | 0.46 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(2, 'dense') |
| - | 4.85±0.02ms | 1.65±0.01ms | 0.34 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(3, 'dense') |
| - | 369±1μs | 207±1μs | 0.56 | polys.TimeGCD_SparseGCDHighDegree.time_op(1, 'dense') |
| - | 2.40±0.02ms | 1.24±0ms | 0.52 | polys.TimeGCD_SparseGCDHighDegree.time_op(3, 'dense') |
| - | 9.97±0.05ms | 4.41±0.02ms | 0.44 | polys.TimeGCD_SparseGCDHighDegree.time_op(5, 'dense') |
| - | 356±3μs | 172±1μs | 0.48 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(1, 'dense') |
| - | 2.49±0.01ms | 892±3μs | 0.36 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(3, 'dense') |
| - | 9.59±0.1ms | 2.66±0.02ms | 0.28 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(5, 'dense') |
| - | 1.01±0.01ms | 429±4μs | 0.42 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'dense') |
| - | 1.70±0.01ms | 520±3μs | 0.31 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse') |
| - | 5.92±0.04ms | 1.78±0.01ms | 0.3 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'dense') |
| - | 8.36±0.02ms | 1.52±0ms | 0.18 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse') |
| - | 290±0.5μs | 64.7±0.5μs | 0.22 | polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse') |
| - | 3.48±0.03ms | 394±6μs | 0.11 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'dense') |
| - | 3.95±0.01ms | 283±1μs | 0.07 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse') |
| - | 7.06±0.07ms | 1.25±0ms | 0.18 | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'dense') |
| - | 8.63±0.06ms | 850±9μs | 0.1 | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse') |
| - | 5.00±0.02ms | 3.03±0.01ms | 0.61 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse') |
| - | 12.0±0.09ms | 6.63±0.02ms | 0.55 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'dense') |
| - | 22.3±0.2ms | 9.16±0.1ms | 0.41 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse') |
| - | 5.22±0.05ms | 885±9μs | 0.17 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse') |
| - | 12.5±0.04ms | 7.17±0.06ms | 0.57 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse') |
| - | 100±0.8ms | 26.2±0.06ms | 0.26 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'dense') |
| - | 164±0.7ms | 55.0±0.1ms | 0.34 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse') |
| - | 174±1μs | 113±2μs | 0.65 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'dense') |
| - | 360±1μs | 214±0.7μs | 0.6 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'sparse') |
| - | 4.20±0.03ms | 844±3μs | 0.2 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'dense') |
| - | 5.26±0.01ms | 385±2μs | 0.07 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse') |
| - | 19.8±0.09ms | 2.80±0.01ms | 0.14 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'dense') |
| - | 22.7±0.2ms | 633±2μs | 0.03 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse') |
| - | 483±4μs | 137±0.8μs | 0.28 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse') |
| - | 4.82±0.03ms | 612±2μs | 0.13 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'dense') |
| - | 5.24±0.03ms | 141±0.7μs | 0.03 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse') |
| - | 13.1±0.08ms | 1.33±0ms | 0.1 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'dense') |
| - | 13.6±0.06ms | 144±0.8μs | 0.01 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse') |
| - | 132±0.7μs | 75.3±1μs | 0.57 | solve.TimeMatrixOperations.time_rref(3, 0) |
| - | 251±1μs | 89.0±0.4μs | 0.36 | solve.TimeMatrixOperations.time_rref(4, 0) |
| - | 24.3±0.2ms | 10.3±0.08ms | 0.42 | solve.TimeSolveLinSys189x49.time_solve_lin_sys |
| - | 28.9±0.3ms | 15.3±0.1ms | 0.53 | solve.TimeSparseSystem.time_linsolve_Aaug(20) |
| - | 56.0±0.2ms | 24.9±0.1ms | 0.44 | solve.TimeSparseSystem.time_linsolve_Aaug(30) |
| - | 28.7±0.2ms | 15.2±0.2ms | 0.53 | solve.TimeSparseSystem.time_linsolve_Ab(20) |
| - | 55.9±0.3ms | 24.7±0.2ms | 0.44 | solve.TimeSparseSystem.time_linsolve_Ab(30) |
Full benchmark results can be found as artifacts in GitHub Actions (click on checks at the top of the PR).
This looks reasonable to me, but we should check that all these methods are actually tested with nontrivial powers.
In some cases iterated multiplication is more efficient than exponentiation by squaring if multiplication is not constant time (e.g. for polynomials).
In some cases iterated multiplication is more efficient than exponentiation by squaring if multiplication is not constant time (e.g. for polynomials).
Is that the case for small n
?
For sufficiently large n, multiplication can still be efficient even if it is not constant time. If the computational cost of multiplication is $m(n)$, the cost of iterated multiplication is $O(m(n)n)$, while the cost of exponentiation is $O(m(n) \log n)$.
The computational cost of multiplication depends on the size of both operands like M(n1,n2)
and for e.g. polynomials with n1
and n2
terms is quadratic like M(n1,n2) ~ n1*n2
at least if the basic polynomial multiplication algorithm is used. In iterated multiplication one operand stays small but in exponentiation by squaring both operands grow.
Consider something like (x + y + z + t)^8
. The number of terms is:
In [37]: [len((Poly(x+y+z+t)**n).coeffs()) for n in range(1, 8+1)]
Out[37]: [4, 10, 20, 35, 56, 84, 120, 165]
In exponentiation by squaring the number of multiplies in the ground domain is:
In [38]: 4*4 + 10*10 + 35*35
Out[38]: 1341
For iterated multiplication it is:
In [45]: 4*4 + 4*10 + 4*20 + 4*35 + 4*56 + 4*84 + 4*120
Out[45]: 1316
These numbers are close but raising to the power 16 the difference is greater:
In [51]: 4*sum([len((Poly(x+y+z+t)**n).coeffs()) for n in range(1, 16)])
Out[51]: 15500
In [52]: 4*4 + 10*10 + 35*35 +165*165
Out[52]: 28566
The more symbols you have and the higher the exponent then the more iterated multiplication is better.
If you have a polynomial $p$ of degree $d$ in $r$ symbols and you want to compute $p^n$ then the iterative method is $O(d^{2r}n^{r+1})$ and exponentiation by squaring is $O(d^{2r}n^{2r})$ assuming constant time coefficient multiplication. The iterative method is asymptotically faster by a factor of $n^{r-1}$.
Counting only the number of coefficient multiplies assumes that multiplication in the coefficient domain is constant which is only true for RR, CC and GF(p). In other domains like ZZ the coefficients grow in size as well as the number of terms compounding the effect that multiplying two large operands is slower than multiplying a small operand by a larger one.
A more complete analysis is in e.g.
[HEI72] L. E. HEINDEL, "Computation of Powers of Multivariate Polynomials over the Integers," J. Comput. System Sci. 1 (1972), pp. 1-8.
https://www.sciencedirect.com/science/article/pii/S0022000072800370
There are also faster algorithms that are more complicated but if we want to stick to simple algorithms then we should remember that iterated multiplication is not necessarily slower than exponentiation by squaring.
Thanks for the explanation. I lacked a thorough understanding of multivariate polynomials.
Looks good. Thanks
References to other Issues or PRs
Brief description of what is fixed or changed
Other comments
Release Notes
NO ENTRY