python / cpython

The Python programming language
https://www.python.org
Other
63.22k stars 30.27k forks source link

Specialization for accurate complex summation in sum()? #121149

Closed skirpichev closed 3 months ago

skirpichev commented 4 months ago

Feature or enhancement

Proposal:

Currently, sum() builtin lacks any specialization for complex numbers, yet it's usually faster than better pure-python alternatives.

benchmark sum() wrt pure-python version ```python # a.py from random import random, seed seed(1) data = [complex(random(), random()) for _ in range(10)] def msum(xs): it = iter(xs) res = next(it) for z in it: res += z return res def sum2(xs): return complex(sum(_.real for _ in xs), sum(_.imag for _ in xs)) ``` ``` $ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)' 500000 loops, best of 11: 963 nsec per loop $ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'msum(data)' 200000 loops, best of 11: 1.31e+03 nsec per loop ``` Hardly using sum() component-wise is an option: ``` $ ./python -m timeit -r11 -unsec -s 'from a import data, sum2' 'sum2(data)' 50000 loops, best of 11: 8.56e+03 nsec per loop ``` --------------------------

Unfortunately, direct using this builtin in numeric code doesn't make sense, as results are (usually) inaccurate. It's not too hard to do summation component-wise with math.fsum(), but it's slow and there might be a better way.

In #100425 simple algorithm, using compensated summation, was implemented in sum() for floats. I propose (1) make specialization in sum() for complex numbers, and (2) reuse #100425 code to implement accurate summation of complexes.

(1) is simple and strightforward, yet it will give some measurable performance boost

benchmark sum() in the main wrt added specialization for complex ```diff diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c index 6e50623caf..da0eed584a 100644 --- a/Python/bltinmodule.c +++ b/Python/bltinmodule.c @@ -2691,6 +2691,59 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) } } } + + if (PyComplex_CheckExact(result)) { + Py_complex c_result = PyComplex_AsCComplex(result); + Py_SETREF(result, NULL); + while(result == NULL) { + item = PyIter_Next(iter); + if (item == NULL) { + Py_DECREF(iter); + if (PyErr_Occurred()) + return NULL; + return PyComplex_FromCComplex(c_result); + } + if (PyComplex_CheckExact(item)) { + Py_complex x = PyComplex_AsCComplex(item); + c_result.real += x.real; + c_result.imag += x.imag; + _Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc); + continue; + } + if (PyLong_Check(item)) { + long value; + int overflow; + value = PyLong_AsLongAndOverflow(item, &overflow); + if (!overflow) { + c_result.real += (double)value; + c_result.imag += 0.0; + Py_DECREF(item); + continue; + } + } + if (PyFloat_Check(item)) { + double value = PyFloat_AS_DOUBLE(item); + c_result.real += value; + c_result.imag += 0.0; + Py_DECREF(item); + continue; + } + result = PyComplex_FromCComplex(c_result); + if (result == NULL) { + Py_DECREF(item); + Py_DECREF(iter); + return NULL; + } + temp = PyNumber_Add(result, item); + Py_DECREF(result); + Py_DECREF(item); + result = temp; + if (result == NULL) { + Py_DECREF(iter); + return NULL; + } + } + } #endif for(;;) { ``` main: ``` $ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)' 500000 loops, best of 11: 963 nsec per loop ``` with specialization: ``` $ ./python -m timeit -r11 -unsec -s 'from a import data, msum' 'sum(data)' 500000 loops, best of 11: 606 nsec per loop ``` --------------------

(2) also seems to be a no-brain task: simple refactoring of PyFloat specialization should allows us use same core for PyComplex specialization.

If there are no objections against - I'll work on a patch.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Eclips4 commented 4 months ago

cc @rhettinger @mdickinson

franklinvp commented 4 months ago

My opinion is that the goal of being accurate should be delegated to a specialized function and that the built in function sum should return to have its simple-to-explain meaning of "adding from left to right using the addition that the elements define".

The change introduced in #100425 increased the complexity of explaining what sum does. See some examples.

Sum of finite precision floating point has its surprises, when one haven't learned that it is not real numbers what is being added. But in my opinion, the built in sum is not the right place to mitigate that. With the aim of explicit is better than implicit, I think it is better that the use of a compensated sum should be in done in a function that is explicitly made for that purpose. Didactically, those who haven't learned floating point arithmetic, should be exposed to it in the immediate tools of the language + and built in sum.

rhettinger commented 4 months ago

If there are not objections against - I'll work on a patch.

+1 from me. As a test, the result should match complex(sum(z.real for z in zarr), sum(z.imag for z in zarr))

skirpichev commented 3 months ago

its simple-to-explain meaning of "adding from left to right using the addition that the elements define".

One reason for builtin function to be a builtin - well, be useful. Just be "simple-to-explain" is not enough.

What is a point in preserving huge accumulated rounding error? Say, if you want to reproduce inaccurate sum with some specific order of operations - you can just use something like functools.reduce(operator.add, xs) or an explicit for loop.

Also, is it "simple-to-explain" for you when sum(xs) and sum(reversed(xs)) are different?

use of a compensated sum should be in done in a function that is explicitly made for that purpose.

There is also math.fsum(), it's mentioned in sphinx docs for sum(). But it uses much more complex algorithm.

(BTW, it looks faster on random data.)

+1 from me.

FYI, pr is ready for review: https://github.com/python/cpython/pull/121176

franklinvp commented 3 months ago

Clap, clap, clap.

Everyone implicitly being forced to opt in doing a compensated sum for float and complex, instead of explicitly doing so. So long explicit is better than implicit. Thanks for ruining one use case, when this algorithm could have perfectly be put either somewhere else or made it switchable, everyone benefitted without anyone being affected.

Please don't ever "enhance" it to subtypes. Those who need a fast and proper sum of float and complex in Python at least still have some hacks like

class R(float): pass
class C(complex): pass
sum(L, start=R(0.0)) # or sum(L, start=C(0.0))

Regarding

Also, is it "simple-to-explain" for you when sum(xs) and sum(reversed(xs)) are different?

That is the type of thing that you want to have to explain. It is a good thing having people come over and over asking about it. That phenomenon is a feature of finite precision floating point arithmetic. They get to learn it. Python, a language used so often for education, now tries to hide this as early as its built in function sum, catering to ignorance.

rhettinger commented 3 months ago

Clap, clap, clap. ... [sarcasm] ... catering to ignorance.

Your abusive posts are not welcome here.

skirpichev commented 3 months ago

Please don't ever "enhance" it to subtypes.

Probably, this doesn't make sense just because __add__() can be changed for subtypes:

>>> class myint(int):
...     def __add__(self, other):
...         print("Nobody expects the Spanish Inquisition!")
...         
>>> myint()+myint()
Nobody expects the Spanish Inquisition!

I can't imagine many practical cases to do that... Perhaps, implementing some parallel algorithm for addition of big ints?

But that's all. More generic checks (say, PyFloat_Check()) are costly in case of negative results. But in sum() we stop specialization of given type and fall though to more generic one (say, to complex) or to using PyNumber_Add(). So, I think that more generic checks are cheap in this type of workflow.

@rhettinger, is this extension of specialization in sum() does make sense for you?

If not, please note that after 88bdb9280b we have PyLong_Check()'s in some places. I think that was a bug (CC @serhiy-storchaka). This was extended in my pr with one PyFloat_Check() and I forgot to fix that, sorry:(

a patch to fix few instances of "soft" type checks ```diff diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c index a5b45e358d..49675c224c 100644 --- a/Python/bltinmodule.c +++ b/Python/bltinmodule.c @@ -2686,7 +2686,7 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) _Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc); continue; } - if (PyLong_Check(item)) { + if (PyLong_CheckExact(item) || PyBool_Check(item)) { long value; int overflow; value = PyLong_AsLongAndOverflow(item, &overflow); @@ -2735,7 +2735,7 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) Py_DECREF(item); continue; } - if (PyLong_Check(item)) { + if (PyLong_CheckExact(item) || PyBool_Check(item)) { long value; int overflow; value = PyLong_AsLongAndOverflow(item, &overflow); @@ -2746,7 +2746,7 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) continue; } } - if (PyFloat_Check(item)) { + if (PyFloat_CheckExact(item)) { double value = PyFloat_AS_DOUBLE(item); re_sum.hi += value; im_sum.hi += 0.0; ```

Python, a language used so often for education, now tries to hide

Not at all!

>>> res, xs = 0.0, [0.1]*10
>>> for _ in xs:
...     res += _
...     
>>> res
0.9999999999999999
serhiy-storchaka commented 3 months ago

If not, please note that after 88bdb92 we have PyLong_Check()'s in some places. I think that was a bug (CC @serhiy-storchaka).

Do you have an example where this makes difference?

skirpichev commented 3 months ago

Hmm, you are right, sorry for a false alarm. All such examples looks broken.