Make an `O(1)` fastpath for `sum(range(...))`

mcognetta commented 1 year ago

Currently, sum(range(...)) falls back to an iterative loop that takes O(n) time, where n is the length of the range. Instead, this can be done in O(1) time with some arithmetic.

This is slightly related to issue #68264, where it is noted that sum(range(...)) had a slightly performance regression between Python2 and 3. However, in both cases, an O(n) time implementation is used.

Note: this is my first time contributing here using only Github. I already have a patch prepared, but I think I needed to open an issue first.

Linked PRs

gh-107870

Eclips4 commented 1 year ago

Hello, and thanks for your effort in writing issue/PR!

I'm 50/50 on it. Calling sum on range objects isn't most frequency situation, but on the other hand, this adds only one if in body of the sum implementation, therefore, it will hardly slow down the rest of the cases when arg of sum isn't a range object.

mdickinson commented 1 year ago

This seems like too rare a use-case to be worth the extra complexity in the sum implementation. @mcognetta Do you have examples of code in the wild that would benefit from this change?

jxu commented 1 year ago

Hopefully any programmer who actually has sum(range(...)) as a performance bottleneck will be able to remember the formula for arithmetic series!

// compute start * length + step * length * (length - 1) // 2

If you know the exact last number end, it's a little simpler to compute (start + end) * length // 2

miccoli commented 1 year ago

I would also argue that sometimes statements like sum(range(BigNumber)) are executed just to add a O(n) workload in some testing environment.

In general I do not think that it is desiderable to outsmart the programmer writing sum(range(...)) with an O(1) implementation.

pochmann commented 1 year ago

The similar all(range(...)) issue has some more thoughts.

sunmy2019 commented 1 year ago

I am negative on this change. Complexity over rare use case.

mcognetta commented 1 year ago

It seems the overall consensus is that this change is not worth the added complexity and so I will close this issue and my PR. Thanks for the comments.

Since I am just starting out contributing to cpython, for my own knowledge, would one of you mind quickly clarifying the two questions I asked in these comments in my PR:

If I needed to add a NULL check after each PyNumber_* operation in my implementation.
- https://github.com/python/cpython/pull/107870#discussion_r1291574538

Why my implementation has a segfault when calling sum(range(N)) when N is too large (I found it happens at N = 2^63):

https://github.com/python/cpython/pull/107870#issuecomment-1675094429

>>> sum(range(2**61))
2658455991569831744654692615953842176
>>> sum(range(2**62))
10633823966279326980924613473029062656
>>> sum(range(2**63 - 1))
42535295865117307919086767873688862721
>>> sum(range(2**63))
Segmentation fault (core dumped)

It can obviously generate numbers larger than the max integer size, and I think every operation in my implementation uses PyObjects not cint, so I am confused what causes this. It also works ok with larger N, as long as the range length isn't so big:

>>> sum(range(2**1000, 2**1000 + 1000, 999))
21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336139751

Thanks in advance.

AA-Turner commented 1 year ago

Closing per Marco's rationale.

A

mdickinson commented 1 year ago

If I needed to add a NULL check after each PyNumber_* operation in my implementation.

Yes: each PyNumber_* call can potentially raise an exception, in which case NULL will be returned and an exception set, so normal code will need to check each return value so that the exception can be propagated if necessary. You'll see this pattern all over the codebase, but the math module is one place you can find lots of PyNumber_* calls in particular. E.g., https://github.com/python/cpython/blob/cc2cf85d03cf29994a707aae5cc9a349a4165b84/Modules/mathmodule.c#L1769-L1794

Why my implementation has a segfault when calling sum(range(N)) when N is too large

There are lots of potential causes for segfaults in Python C-API code; refcounting and exception-handling issues are probably the most common, but by no means the only ones. In this particular case, I think you're running into the issue that len of a range fails for large ranges:

>>> len(range(2**63))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

So your call to builtin_len returns NULL and sets an exception, and you then try to pass that NULL as a number into subsequent operations. This is good example of why NULL returns need to be handled: if you add error checks for both the builtin_len call and the range_sum_fastpath into your code, then you'd see a Python exception instead of the segfault:

>>> sum(range(2**63))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

mdickinson commented 1 year ago

@mcognetta You might find the tutorial useful on exception handling and reference counting:

corona10 commented 1 year ago

IMHO, this kind of optimization should be achieved by bytecode-level specialization, not runtime-level optimization.

terryjreedy commented 1 year ago

If we had a multitude of compressed (vitural) range collection objects, then it might make sense to add sum and possibly even stdev special method objects. But we do not. (And to me this seems like something for PyPI.)

mcognetta commented 1 year ago

@mdickinson Thanks for the detailed comment! That makes sense to me.

One last note, I had planned for another range optimization: checking contains for non-integral types. E.g.,

>>> import time
>>> r = range(100000000)
>>> t=time.time();8000000 in r;print(time.time()-t)
True
4.57763671875e-05
>>> t=time.time();8000000.0 in r;print(time.time()-t)
True
0.20510268211364746

I imagine this would similarly be rejected, but I am putting it here as another place where operations on range are quick to fall back on an iterative method.

Anyway, thanks again for the comments, I appreciated the discussion.

mdickinson commented 1 year ago

@mcognetta

another range optimization: checking contains for non-integral types

Take a look at https://discuss.python.org/t/fast-path-for-float-in-range-checks/18248 for previous discussion. Indeed there didn't seem to be much consensus that this would be a good idea.

python / cpython

Make an `O(1)` fastpath for `sum(range(...))` #107868

Linked PRs