Enhanced the efficiency of power calculations

haru-44 commented 2 weeks ago

References to other Issues or PRs

Brief description of what is fixed or changed

Other comments

Release Notes

NO ENTRY

sympy-bot commented 2 weeks ago

:white_check_mark:

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

No release notes entry will be added for this pull request.

Click here to see the pull request description that was parsed.

#### References to other Issues or PRs #### Brief description of what is fixed or changed #### Other comments #### Release Notes NO ENTRY

github-actions[bot] commented 2 weeks ago

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1 means a speed up and greater than 1 means a slowdown. Green lines beginning with + are slowdowns (the PR is slower then master or master is slower than the previous release). Red lines beginning with - are speedups.

Significantly changed benchmark results (PR vs master)

Significantly changed benchmark results (master vs previous release)

| Change   | Before [2487dbb5]    | After [8519e0f7]    |   Ratio | Benchmark (Parameter)                                                |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------|
| -        | 70.8±0.6ms           | 44.2±0.2ms          |    0.62 | integrate.TimeIntegrationRisch02.time_doit(10)                       |
| -        | 68.1±0.7ms           | 43.4±0.4ms          |    0.64 | integrate.TimeIntegrationRisch02.time_doit_risch(10)                 |
| +        | 18.7±0.6μs           | 29.8±0.6μs          |    1.59 | integrate.TimeIntegrationRisch03.time_doit(1)                        |
| -        | 5.41±0.02ms          | 2.87±0.01ms         |    0.53 | logic.LogicSuite.time_load_file                                      |
| -        | 73.0±0.3ms           | 28.5±0.1ms          |    0.39 | polys.TimeGCD_GaussInt.time_op(1, 'dense')                           |
| -        | 73.0±0.3ms           | 28.9±0.2ms          |    0.4  | polys.TimeGCD_GaussInt.time_op(1, 'sparse')                          |
| -        | 252±1ms              | 124±0.5ms           |    0.49 | polys.TimeGCD_GaussInt.time_op(2, 'dense')                           |
| -        | 251±1ms              | 125±0.4ms           |    0.5  | polys.TimeGCD_GaussInt.time_op(2, 'sparse')                          |
| -        | 643±2ms              | 374±3ms             |    0.58 | polys.TimeGCD_GaussInt.time_op(3, 'dense')                           |
| -        | 653±2ms              | 372±1ms             |    0.57 | polys.TimeGCD_GaussInt.time_op(3, 'sparse')                          |
| -        | 496±4μs              | 283±3μs             |    0.57 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(1, 'dense')            |
| -        | 1.74±0.01ms          | 1.04±0.01ms         |    0.6  | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(2, 'dense')            |
| -        | 5.77±0.07ms          | 3.07±0.03ms         |    0.53 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(3, 'dense')            |
| -        | 450±10μs             | 228±2μs             |    0.51 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(1, 'dense')               |
| -        | 1.48±0.01ms          | 680±4μs             |    0.46 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(2, 'dense')               |
| -        | 4.85±0.02ms          | 1.65±0.01ms         |    0.34 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(3, 'dense')               |
| -        | 369±1μs              | 207±1μs             |    0.56 | polys.TimeGCD_SparseGCDHighDegree.time_op(1, 'dense')                |
| -        | 2.40±0.02ms          | 1.24±0ms            |    0.52 | polys.TimeGCD_SparseGCDHighDegree.time_op(3, 'dense')                |
| -        | 9.97±0.05ms          | 4.41±0.02ms         |    0.44 | polys.TimeGCD_SparseGCDHighDegree.time_op(5, 'dense')                |
| -        | 356±3μs              | 172±1μs             |    0.48 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(1, 'dense')            |
| -        | 2.49±0.01ms          | 892±3μs             |    0.36 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(3, 'dense')            |
| -        | 9.59±0.1ms           | 2.66±0.02ms         |    0.28 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(5, 'dense')            |
| -        | 1.01±0.01ms          | 429±4μs             |    0.42 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'dense')           |
| -        | 1.70±0.01ms          | 520±3μs             |    0.31 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse')          |
| -        | 5.92±0.04ms          | 1.78±0.01ms         |    0.3  | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'dense')           |
| -        | 8.36±0.02ms          | 1.52±0ms            |    0.18 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse')          |
| -        | 290±0.5μs            | 64.7±0.5μs          |    0.22 | polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse')             |
| -        | 3.48±0.03ms          | 394±6μs             |    0.11 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'dense')              |
| -        | 3.95±0.01ms          | 283±1μs             |    0.07 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse')             |
| -        | 7.06±0.07ms          | 1.25±0ms            |    0.18 | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'dense')              |
| -        | 8.63±0.06ms          | 850±9μs             |    0.1  | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse')             |
| -        | 5.00±0.02ms          | 3.03±0.01ms         |    0.61 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse') |
| -        | 12.0±0.09ms          | 6.63±0.02ms         |    0.55 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'dense')  |
| -        | 22.3±0.2ms           | 9.16±0.1ms          |    0.41 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse') |
| -        | 5.22±0.05ms          | 885±9μs             |    0.17 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse')    |
| -        | 12.5±0.04ms          | 7.17±0.06ms         |    0.57 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse')    |
| -        | 100±0.8ms            | 26.2±0.06ms         |    0.26 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'dense')     |
| -        | 164±0.7ms            | 55.0±0.1ms          |    0.34 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse')    |
| -        | 174±1μs              | 113±2μs             |    0.65 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'dense')      |
| -        | 360±1μs              | 214±0.7μs           |    0.6  | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'sparse')     |
| -        | 4.20±0.03ms          | 844±3μs             |    0.2  | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'dense')      |
| -        | 5.26±0.01ms          | 385±2μs             |    0.07 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse')     |
| -        | 19.8±0.09ms          | 2.80±0.01ms         |    0.14 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'dense')      |
| -        | 22.7±0.2ms           | 633±2μs             |    0.03 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse')     |
| -        | 483±4μs              | 137±0.8μs           |    0.28 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse') |
| -        | 4.82±0.03ms          | 612±2μs             |    0.13 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'dense')  |
| -        | 5.24±0.03ms          | 141±0.7μs           |    0.03 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse') |
| -        | 13.1±0.08ms          | 1.33±0ms            |    0.1  | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'dense')  |
| -        | 13.6±0.06ms          | 144±0.8μs           |    0.01 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse') |
| -        | 132±0.7μs            | 75.3±1μs            |    0.57 | solve.TimeMatrixOperations.time_rref(3, 0)                           |
| -        | 251±1μs              | 89.0±0.4μs          |    0.36 | solve.TimeMatrixOperations.time_rref(4, 0)                           |
| -        | 24.3±0.2ms           | 10.3±0.08ms         |    0.42 | solve.TimeSolveLinSys189x49.time_solve_lin_sys                       |
| -        | 28.9±0.3ms           | 15.3±0.1ms          |    0.53 | solve.TimeSparseSystem.time_linsolve_Aaug(20)                        |
| -        | 56.0±0.2ms           | 24.9±0.1ms          |    0.44 | solve.TimeSparseSystem.time_linsolve_Aaug(30)                        |
| -        | 28.7±0.2ms           | 15.2±0.2ms          |    0.53 | solve.TimeSparseSystem.time_linsolve_Ab(20)                          |
| -        | 55.9±0.3ms           | 24.7±0.2ms          |    0.44 | solve.TimeSparseSystem.time_linsolve_Ab(30)                          |

Full benchmark results can be found as artifacts in GitHub Actions (click on checks at the top of the PR).

asmeurer commented 2 weeks ago

This looks reasonable to me, but we should check that all these methods are actually tested with nontrivial powers.

oscarbenjamin commented 1 week ago

In some cases iterated multiplication is more efficient than exponentiation by squaring if multiplication is not constant time (e.g. for polynomials).

haru-44 commented 1 week ago

In some cases iterated multiplication is more efficient than exponentiation by squaring if multiplication is not constant time (e.g. for polynomials).

Is that the case for small n? For sufficiently large n, multiplication can still be efficient even if it is not constant time. If the computational cost of multiplication is $m(n)$, the cost of iterated multiplication is $O(m(n)n)$, while the cost of exponentiation is $O(m(n) \log n)$.

oscarbenjamin commented 1 week ago

The computational cost of multiplication depends on the size of both operands like M(n1,n2) and for e.g. polynomials with n1 and n2 terms is quadratic like M(n1,n2) ~ n1*n2 at least if the basic polynomial multiplication algorithm is used. In iterated multiplication one operand stays small but in exponentiation by squaring both operands grow.

Consider something like (x + y + z + t)^8. The number of terms is:

In [37]: [len((Poly(x+y+z+t)**n).coeffs()) for n in range(1, 8+1)]
Out[37]: [4, 10, 20, 35, 56, 84, 120, 165]

In exponentiation by squaring the number of multiplies in the ground domain is:

In [38]: 4*4 + 10*10 + 35*35
Out[38]: 1341

For iterated multiplication it is:

In [45]: 4*4 + 4*10 + 4*20 + 4*35 + 4*56 + 4*84 + 4*120
Out[45]: 1316

These numbers are close but raising to the power 16 the difference is greater:

In [51]: 4*sum([len((Poly(x+y+z+t)**n).coeffs()) for n in range(1, 16)])
Out[51]: 15500

In [52]: 4*4 + 10*10 + 35*35 +165*165
Out[52]: 28566

The more symbols you have and the higher the exponent then the more iterated multiplication is better.

If you have a polynomial $p$ of degree $d$ in $r$ symbols and you want to compute $p^n$ then the iterative method is $O(d^{2r}n^{r+1})$ and exponentiation by squaring is $O(d^{2r}n^{2r})$ assuming constant time coefficient multiplication. The iterative method is asymptotically faster by a factor of $n^{r-1}$.

Counting only the number of coefficient multiplies assumes that multiplication in the coefficient domain is constant which is only true for RR, CC and GF(p). In other domains like ZZ the coefficients grow in size as well as the number of terms compounding the effect that multiplying two large operands is slower than multiplying a small operand by a larger one.

A more complete analysis is in e.g.

[HEI72] L. E. HEINDEL, "Computation of Powers of Multivariate Polynomials over the Integers," J. Comput. System Sci. 1 (1972), pp. 1-8.

https://www.sciencedirect.com/science/article/pii/S0022000072800370

There are also faster algorithms that are more complicated but if we want to stick to simple algorithms then we should remember that iterated multiplication is not necessarily slower than exponentiation by squaring.

haru-44 commented 1 week ago

Thanks for the explanation. I lacked a thorough understanding of multivariate polynomials.

oscarbenjamin commented 1 week ago

Looks good. Thanks

sympy / sympy