Closed b422ac3d-7c23-4c73-bf47-845c5dedaf2d closed 3 years ago
fractions.py uses naive algorithms for doing arithmetics.
It may worth implementing less trivial versions for addtion/substraction and multiplication (e.g. Henrici algorithm and so on), described here: https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps as e.g gmplib does: https://gmplib.org/repo/gmp/file/tip/mpq/aors.c
Some projects (e.g. SymPy here: https://github.com/sympy/sympy/pull/12656) reinvent the stdlib's Fraction just to add such simple improvements. With big denominators (~10**6) this really does matter, my local benchmarks suggest the order of magnitude difference for summation of several such numbers.
Here's some code to try out:
from math import gcd
from fractions import Fraction
import operator
import math
class Henrici(Fraction):
'Reformulate _mul to reduce the size of intermediate products'
# Original has 2 multiplications, 1 gcd calls, and 2 divisions
# This one has 2 multiplications, 2 gcd calls, and 4 divisions
def _mul(a, b):
a_n, a_d = a.numerator, a.denominator
b_n, b_d = b.numerator, b.denominator
d1 = math.gcd(a_n, b_d)
a_n //= d1
b_d //= d1
d2 = math.gcd(b_n, a_d)
b_n //= d2
a_d //= d2
result = Fraction(a_n * b_n, a_d * b_d, _normalize=False)
assert math.gcd(a_n * b_n, a_d * b_d) == 1 and a_d * b_d >= 0
return result
__mul__, __rmul__ = Fraction._operator_fallbacks(_mul, operator.mul)
assert Henrici(10, 3) * Henrici(6, 5) == Henrici(4, 1)
Note that Fraction arithmetic has a huge administrative overhead. The cost of the underlying multiplications and divisions won't dominate the total time until the numerators and denominators are very large.
For the proposed optimization, this implies that cost for the extra Python steps to implement the optimization will be negligible. The benefits of the optimization are similarly attenuated.
-- Update to experimentation code: add guards for the relatively prime case. --
class Henrici(Fraction):
'Reformulate _mul to reduce the size of intermediate products'
def _mul(a, b):
a_n, a_d = a.numerator, a.denominator
b_n, b_d = b.numerator, b.denominator
d1 = math.gcd(a_n, b_d)
if d1 > 1:
a_n //= d1
b_d //= d1
d2 = math.gcd(b_n, a_d)
if d2 > 1:
b_n //= d2
a_d //= d2
result = Fraction(a_n * b_n, a_d * b_d, _normalize=False)
assert math.gcd(a_n * b_n, a_d * b_d) == 1 and a_d * b_d >= 0
return result
The cost to the common case for small components is about 20%:
$ python3.10 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
200000 loops, best of 11: 1.8 usec per loop
$ python3.10 -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
100000 loops, best of 11: 2.14 usec per loop
Without the guards the incremental cost drops to 10%.
$ python3.10 -m timeit -r11 -s 'from patched_noguards import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
100000 loops, best of 11: 2.02 usec per loop
I have similar timings (for a draft version of PR, see f.patch) as for the last comment, though the small-dens overhead seems to be bigger(~20%): $ python3.10 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a b' 50000 loops, best of 11: 9.09 usec per loop $ python3.10 -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a b' 20000 loops, best of 11: 11.2 usec per loop
On another hand, here are timings for bigger denominators: $ python3.10 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'import random' -s 'n = [random.randint(1, 1000000) for in range(1000)]' -s 'd = [random.randint(1, 1000000) for in range(1000)]' -s 'a=list(map(lambda x: F(*x), zip(n, d)))' 'sum(a)' 1 loop, best of 11: 257 msec per loop $ ... from patched ... 10 loops, best of 11: 33.2 msec per loop
It's not so clear what "are very large" does mean, that could be defined here. BTW, 10**6 denominators are (very!) small for mentioned above use case (CAS package).
1 loop, best of 11: 257 msec per loop $ ... from patched ... 10 loops, best of 11: 33.2 msec per loop
Looks worth it :-)
Personally, I'd prefer to keep the code simplicity, and the speed for small inputs here. Python's needs aren't the same as SymPy's needs or SAGE's needs, and not all of the fractions.Fraction use-cases involve summing lots of values with incompatible denominators.
With big denominators (~10**6) this really does matter, my local benchmarks suggest the order of magnitude difference for summation of several such numbers.
Could you give some idea of the crossover point for a single addition? At roughly what size numerator/denominator do we start seeing a performance benefit?
I'd prefer to keep the code simplicity
It's not going to be complicated so much and algorithms are well known, but I see your point.
and the speed for small inputs here
Speed loss is not so big, 10-20%.
Python's needs aren't the same as SymPy's needs or SAGE's needs
So, for which needs it will serve?
Sorry, I can't suggest an application, which does use builtin Fraction's (not sure if even SAGE uses them, as a fallback). SymPy doesn't, for sure (but it could - it's PythonRational class uses same optimizations, except for g == 1 branches in _add/_sub, I think).
There is one exception I've found: stdlib's statistics module uses Fraction's in the _sum() helper, exactly in a paradigm "sum a lot of values".
not all of the fractions.Fraction use-cases involve summing lots of values with incompatible denominators.
No need for a lots of values (i.e. 1000): denominator of the sum will grow very fast, that why modern CAS use modular GCD algorithms, for example.
Could you give some idea of the crossover point for a single addition?
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,31011021112)' -s 'b=F(86,11011021115)' 'a + b'
20000 loops, best of 11: 12.4 usec per loop
$ ./python -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,31011021112)' -s 'b=F(86,11011021115)' 'a + b'
20000 loops, best of 11: 12.5 usec per loop
There is one exception I've found: stdlib's statistics module uses Fraction's in the _sum() helper, exactly in a paradigm "sum a lot of values".
That's an interesting example: it does indeed satisfy the "sum of a lot of values" part, but not the "incompatible denominators" part. :-) The typical use there is that those fractions have been converted from floats, and so all denominators will be a smallish power of two. So we don't encounter the same coefficient explosion problem that we do when summing fractions with unrelated denominators. I have many similar use-cases in my own code, where numerators and denominators don't tend to ever get beyond a few hundred digits, and usually not more than tens of digits.
Thanks for the timings! So assuming that wasn't a specially-chosen best case example, the crossover is somewhere around numerators and denominators of ten digits or so.
It's not going to be complicated so much
For me, the big difference is that the current code is obviously correct at a glance, while the proposed code takes study and thought to make sure that no corner cases are missed.
Shrug. Put me down as -0.
I'm surprised to hear that the "typical use-case" of Fraction is fractions converted from floats. Do you have evidence in the wild to support that?
I would expect any application that uses fractions "generically" to run into the same sorts of problems SymPy does. The issue is that the sum or product of two unrelated fractions has a denominator that is ~ the product of the denominators of each term. So they tend to grow large, unless there is some structure in the terms that results in lots of cancellation. That's why real world numeric typically doesn't use exact arithmetic, but there are legitimate use-cases for it (computer algebra being one).
This actually also applies even if the denominators are powers of 2. That's why arbitrary precision floating point numbers like Decimal or mpmath.mpf limit the precision, or effectively, the power of 2 in the denominator.
By the way, the "algorithm" here really isn't that complicated. I didn't even realize it had a name. The idea is that for a/b c/d, if a/b and c/d are already in lowest terms, then the only cancellation that can happen is from a/d or from c/b. So instead of computing gcd(a*c, b\d), we only compute gcd(a, d) and gcd(c, b) and cancel them off the corresponding terms. It turns out to be faster to take two gcds of smaller numbers than one gcd of big ones. The algorithm for addition is a bit more complicated, at least to see that it is correct, but is still not that bad (the paper linked in the OP explains it clearly in one paragraph). It's far less complicated than, for example, Lehmer's gcd algorithm (which is implemented in math.gcd).
On Sun, Mar 07, 2021 at 10:34:24PM +0000, Aaron Meurer wrote:
I'm surprised to hear that the "typical use-case" of Fraction is fractions converted from floats.
For statistics module - that may be true. Unfortunately, no (other) practical applications, using Fraction's, were proposed by my reviewers so far.
By the way, the "algorithm" here really isn't that complicated. I didn't even realize it had a name.
Rather "algorithms"; everything is there in the II-nd volume of the Knuth, section 4.5 - Rational Arithmetic. Probably, this is even a better reference, since it explains gcd==1 case for addition. Both, however, reference the Henrici article.
It's far less complicated than, for example, Lehmer's gcd algorithm (which is implemented in math.gcd).
Or Karatsuba multiplication.
BTW, low-denominators performance may be restored (at least partially), using same approach (like KARATSUBA_CUTOFF - but checking the maximal denominator). I don't like this idea, but...
On Sun, Mar 07, 2021 at 12:16:36PM +0000, Mark Dickinson wrote:
but not the "incompatible denominators" part. :-) The typical use there is that those fractions have been converted from floats
But there is no limits to use Fraction's for input, e.g. there are docstring examples for mean() and variance(). In that case (general one for a summation) - common denominators is a very special situation.
Thanks for the timings! So assuming that wasn't a specially-chosen best case example
No, but this will handle the first branch.
> It's not going to be complicated so much For me, the big difference is that the current code is obviously correct
That may be fixed by keeping relevant references right in the code, not in the commit message. Python sources have many much more non-trivial algorithms...
I agree with everyone ;-) That is, my _experience_ matches Mark's: as a more-or-less "numeric expert", I use Fraction in cases where it's already fast enough. Python isn't a CAS, and, e.g., in pure Python I'm not doing things like computing or composing power series with rational coefficients (but routinely do such stuff in a CAS). It's usually just combinatorial probabilities in relatively simple settings, and small-scale computations where float precision would be fine except I don't want to bother doing error analysis first to ensure things like catastrophic cancellation can't occur.
On the other hand, the proposed changes are bog standard optimizations for implementations of rationals, and can pay off very handsomely at relatively modest argument sizes.
So I'm +0. I don't really mind the slowdown for small arguments because, as above, I just don't use Fraction for extensive computation. But the other side of that is that I won't profit much from optimizations either, and while the optimizations for * and / are close to self-evident, those for + and - are much subtler. Code obscurity imposes ongoing costs of its own.
WRT which, I added Python's Karatsuba implementation and regret doing so. I don't want to take it out now (it's already in ;-) ), but it added quite a pile of delicate code to what _was_ a much easier module to grasp. People who need fast multiplication are still far better off using gmpy2 anyway (or fiddling Decimal precision - Stefan Krah implemented "advanced" multiplication schemes for that module).
On Tue, Mar 09, 2021 at 05:11:09AM +0000, Tim Peters wrote:
those for + and - are much subtler
In fact, these optimizations will payoff faster (wrt denominators size), esp. due to gcd==1 branch.
Sorry for off-topic:
WRT which, I added Python's Karatsuba implementation and regret doing so. I don't want to take it out now (it's already in ;-) ), but it added quite a pile of delicate code to what _was_ a much easier module to grasp.
(And was much more useless, even as a fallback.
But in the end - I agreed, you can't outperform professional bigint implementations. I think, you can _use_ them instead.)
People who need fast multiplication are still far better off using gmpy2 anyway
(Another strange python "feature", IMHO. Why the custom bigint implementation, why not use the project, that run professionals in the field? Looking on the bpo-21922 - it seems, that small ints arithmetics can be almost as fast as for python ints. Is the memory handling - out-of-memory situation - the only severe problem?)
bpo-21922 lists several concerns, and best I know they all still apply. As a practical matter, I expect the vast bulk of core Python developers would reject a change that sped large int basic arithmetic by a factor of a billion if it slowed down basic arithmetic on native machine-size ints by even half a percent.
A possible resolution to this issue might be augmenting https://docs.python.org/3/library/fractions.html#module-fractions with a short paragraph or section on alternative implementations noting that there is a tradeoff between speed and complexity (and assured correctness). If sympy has faster rationals, list that, and any other python-accessible alternative we have confidence in.
Terry, we could do that, but the opposition here isn't strong, and is pretty baffling anyway ;-) : the suggested changes are utterly ordinary for implementations of rationals, require very little code, are not delicate, and are actually straightforward to see are correct (although unfamiliarity can be an initial barrier - e.g., if you don't already know that after
g = gcd(a, b)
a1 = a // g
b1 = b // g
it's necessarily true that a1 and b1 are coprime, a key reason for way the transformation is correct will be lost on you - but it's also very easy to prove that claim once you're told that it is a key here). The OP is right that "we" (at least Mark, and Raymond, and I) have routinely checked in far subtler optimizations in various areas.
a short paragraph or section on alternative implementations
There are no alternative implementations. SymPy's PythonRational (AFAIR, this class not in the default namespace) is an internal fallback solution, for case when no gmpy2 is available.
Maybe we should list gmpy2 everywhere people expect fast bigint/rationals (i.e. int docs, math module, Fraction and so on), so they will not not be disappointed...
complexity (and assured correctness)
Much more complex (IMHO) code was accepted (there were examples for C, but the limit_denominator() method - an example for Python code, from the same module!).
In fact, it pretty obvious that output fractions are equal to the middle-school versions. Non-trivial part may be why they are normalized. I hope, now this covered by comments.
New changeset 690aca781152a498f5117682524d2cd9aa4d7657 by Sergey B Kirpichev in branch 'master': bpo-43420: Simple optimizations for Fraction's arithmetics (GH-24779) https://github.com/python/cpython/commit/690aca781152a498f5117682524d2cd9aa4d7657
Thanks, all! This has been merged now. If someone wants to continue pursuing things left hanging, I'd suggest opening a different BPO report.
On Mon, Mar 22, 2021 at 02:35:59AM +0000, Tim Peters wrote:
Thanks, all! This has been merged now. If someone wants to continue pursuing things left hanging, I'd suggest opening a different BPO report.
Tim, if you are about micro-optimizations for small components (properties->attributes and so on, I think this not worth BPO report of the news entry, isn't?
Thanks for all reviewers and commenters.
If experience is any guide, nothing about anything here will go smoothly ;-)
For example, setting up a module global _gcd
name for math.gcd
is a very standard, widespread kind of micro-optimization. But - if that's thought to be valuable (who knows? maybe someone will complain) - why not go on instead to add not-intended-to-be-passed trailing default _gcd=math.gcd
arguments to the methods? Then it's even faster (& uglier, of course).
Or wrt changing properties to private attributes, that speeds some things but slows others - and, unless I missed it, nobody who wrote that code to begin with said a word about why it was done that way.
I'm not going to "pre-bless" shortcuts in an area where everything so far has been more contentious than it "should have been" (to my eyes). Opening a BPO report is a trivial effort. Skipping NEWS, or not, depends on how it goes.
On Mon, Mar 22, 2021 at 04:34:32AM +0000, Tim Peters wrote:
For example, setting up a module global
_gcd
name formath.gcd
Looking on the stdlib, I would just import gcd.
default
_gcd=math.gcd
arguments to the methods? Then it's even faster (& uglier, of course).
... and less readable.
Not sure if speedup will be noticeable. But more important is that I see no such micro-optimizations across the stdlib. Probably, this will be the reason for rejection.
Or wrt changing properties to private attributes, that speeds some things but slows others - and, unless I missed it, nobody who wrote that code to begin with said a word about why it was done that way.
Yes, I'll dig into the history. Commen^WDocstring doesn't explain this (for me).
Opening a BPO report is a trivial effort.
Sure, but such report will require patch to be discussed, anyway.
This report is closed. Please open a different report.
We've already demonstrated that, as predicted, nothing can be said here without it being taken as invitation to open-ended discussion. So it goes, but it doesn't belong on _this_ report anymore.
Just FYI, I applied the same changes to the quicktions [1] module, a Cython compiled (and optimised) version of fractions.Fraction.
[1] https://github.com/scoder/quicktions/
The loss in performance for small values is much higher there, almost 2x for the example given (compared to 10-20% for CPython):
$ python3.8 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
100000 loops, best of 11: 1.94 usec per loop
Original: $ python3.8 -m timeit -r11 -s 'from quicktions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b' 1000000 loops, best of 11: 214 nsec per loop
Patched: $ python3.8 -m timeit -r11 -s 'from quicktions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b' 500000 loops, best of 11: 391 nsec per loop
For the larger values example, the gain is tremendous, OTOH:
$ python3.8 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'import random' -s 'n = [random.randint(1, 1000000) for _ in range(1000)]' -s 'd = [random.randint(1, 1000000) for _ in range(1000)]' -s 'a=list(map(lambda x: F(*x), zip(n, d)))' 'sum(a)'
2 loops, best of 11: 150 msec per loop
Original:
$ python3.8 -m timeit -r11 -s 'from quicktions import Fraction as F' -s 'import random' -s 'n = [random.randint(1, 1000000) for in range(1000)]' -s 'd = [random.randint(1, 1000000) for in range(1000)]' -s 'a=list(map(lambda x: F(*x), zip(n, d)))' 'sum(a)'
2 loops, best of 11: 135 msec per loop
Patched: $ python3.8 -m timeit -r11 -s 'from quicktions import Fraction as F' -s 'import random' -s 'n = [random.randint(1, 1000000) for in range(1000)]' -s 'd = [random.randint(1, 1000000) for in range(1000)]' -s 'a=list(map(lambda x: F(*x), zip(n, d)))' 'sum(a)' 50 loops, best of 11: 9.65 msec per loop
I'll have to see if the slowdown can be mitigated somehow.
Interesting enough, the telco benchmark seems to benefit slightly from this:
Original: $ python3.8 benchmark/telco_fractions.py -n 200 # avg 0.0952927148342
Patched: $ python3.8 benchmark/telco_fractions.py -n 200 # avg 0.0914235627651
I'll have to see if the slowdown can be mitigated somehow.
Yes, for small components this is a known slowdown. I'm trying to mitigate that case in https://github.com/python/cpython/pull/25518. Except for the mixed mode (Fraction's+int) - this match the original performance.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-feature', 'library', '3.10']
title = 'Optimize rational arithmetics'
updated_at =
user = 'https://github.com/skirpichev'
```
bugs.python.org fields:
```python
activity =
actor = 'Sergey.Kirpichev'
assignee = 'none'
closed = True
closed_date =
closer = 'tim.peters'
components = ['Library (Lib)']
creation =
creator = 'Sergey.Kirpichev'
dependencies = []
files = ['49854']
hgrepos = []
issue_num = 43420
keywords = ['patch']
message_count = 27.0
messages = ['388200', '388210', '388212', '388214', '388215', '388221', '388227', '388232', '388235', '388236', '388249', '388256', '388260', '388328', '388415', '388477', '388563', '388590', '388591', '389270', '389271', '389274', '389276', '389277', '389342', '393807', '393808']
nosy_count = 7.0
nosy_names = ['tim.peters', 'rhettinger', 'terry.reedy', 'mark.dickinson', 'scoder', 'Sergey.Kirpichev', 'asmeurer']
pr_nums = ['24779']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue43420'
versions = ['Python 3.10']
```