Closed mcognetta closed 1 year ago
Hello, and thanks for your effort in writing issue/PR!
I'm 50/50 on it.
Calling sum on range objects isn't most frequency situation, but on the other hand, this adds only one if
in body of the sum
implementation, therefore, it will hardly slow down the rest of the cases when arg of sum isn't a range
object.
This seems like too rare a use-case to be worth the extra complexity in the sum
implementation. @mcognetta Do you have examples of code in the wild that would benefit from this change?
Hopefully any programmer who actually has sum(range(...))
as a performance bottleneck will be able to remember the formula for arithmetic series!
// compute start * length + step * length * (length - 1) // 2
If you know the exact last number end
, it's a little simpler to compute (start + end) * length // 2
I would also argue that sometimes statements like sum(range(BigNumber))
are executed just to add a O(n) workload in some testing environment.
In general I do not think that it is desiderable to outsmart the programmer writing sum(range(...))
with an O(1) implementation.
The similar all(range(...))
issue has some more thoughts.
I am negative on this change. Complexity over rare use case.
It seems the overall consensus is that this change is not worth the added complexity and so I will close this issue and my PR. Thanks for the comments.
Since I am just starting out contributing to cpython, for my own knowledge, would one of you mind quickly clarifying the two questions I asked in these comments in my PR:
If I needed to add a NULL
check after each PyNumber_*
operation in my implementation.
Why my implementation has a segfault when calling sum(range(N))
when N
is too large (I found it happens at N = 2^63
):
>>> sum(range(2**61))
2658455991569831744654692615953842176
>>> sum(range(2**62))
10633823966279326980924613473029062656
>>> sum(range(2**63 - 1))
42535295865117307919086767873688862721
>>> sum(range(2**63))
Segmentation fault (core dumped)
It can obviously generate numbers larger than the max integer size, and I think every operation in my implementation uses PyObjects
not cint
, so I am confused what causes this. It also works ok with larger N
, as long as the range length isn't so big:
>>> sum(range(2**1000, 2**1000 + 1000, 999))
21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336139751
Thanks in advance.
Closing per Marco's rationale.
A
- If I needed to add a
NULL
check after eachPyNumber_*
operation in my implementation.
Yes: each PyNumber_*
call can potentially raise an exception, in which case NULL
will be returned and an exception set, so normal code will need to check each return value so that the exception can be propagated if necessary. You'll see this pattern all over the codebase, but the math module is one place you can find lots of PyNumber_*
calls in particular. E.g., https://github.com/python/cpython/blob/cc2cf85d03cf29994a707aae5cc9a349a4165b84/Modules/mathmodule.c#L1769-L1794
- Why my implementation has a segfault when calling
sum(range(N))
whenN
is too large
There are lots of potential causes for segfaults in Python C-API code; refcounting and exception-handling issues are probably the most common, but by no means the only ones. In this particular case, I think you're running into the issue that len
of a range fails for large ranges:
>>> len(range(2**63))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
So your call to builtin_len
returns NULL
and sets an exception, and you then try to pass that NULL
as a number into subsequent operations. This is good example of why NULL
returns need to be handled: if you add error checks for both the builtin_len
call and the range_sum_fastpath
into your code, then you'd see a Python exception instead of the segfault:
>>> sum(range(2**63))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
@mcognetta You might find the tutorial useful on exception handling and reference counting:
IMHO, this kind of optimization should be achieved by bytecode-level specialization, not runtime-level optimization.
If we had a multitude of compressed (vitural) range collection objects, then it might make sense to add sum and possibly even stdev special method objects. But we do not. (And to me this seems like something for PyPI.)
@mdickinson Thanks for the detailed comment! That makes sense to me.
One last note, I had planned for another range optimization: checking contains
for non-integral types. E.g.,
>>> import time
>>> r = range(100000000)
>>> t=time.time();8000000 in r;print(time.time()-t)
True
4.57763671875e-05
>>> t=time.time();8000000.0 in r;print(time.time()-t)
True
0.20510268211364746
I imagine this would similarly be rejected, but I am putting it here as another place where operations on range are quick to fall back on an iterative method.
Anyway, thanks again for the comments, I appreciated the discussion.
@mcognetta
another range optimization: checking
contains
for non-integral types
Take a look at https://discuss.python.org/t/fast-path-for-float-in-range-checks/18248 for previous discussion. Indeed there didn't seem to be much consensus that this would be a good idea.
Currently,
sum(range(...))
falls back to an iterative loop that takesO(n)
time, wheren
is the length of the range. Instead, this can be done inO(1)
time with some arithmetic.This is slightly related to issue #68264, where it is noted that
sum(range(...))
had a slightly performance regression between Python2 and 3. However, in both cases, anO(n)
time implementation is used.Note: this is my first time contributing here using only Github. I already have a patch prepared, but I think I needed to open an issue first.
Linked PRs