Closed b5d2b95a-2226-4555-ae5f-b51f17ec7501 closed 10 years ago
Python 3.0 (r30:67503, Jan 7 2009, 16:22:01) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
>>> from itertools import islice, count
>>> islice(count(), (1<<31) - 1)
<itertools.islice object at 0x63a0c0>
>>> islice(count(), (1<<31))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Stop argument for islice() must be a non-negative integer or
None.
Clarified the error message in r73535.
Leaving open as a feature request to support arbitrarily large indices.
Unassigning. If someone is interested creating a patch, I think it is a reasonable request.
In 3.1.2, range handles large numbers.
>>> list(range(1000000000, 50000000000, 10000000000))
[1000000000, 11000000000, 21000000000, 31000000000, 41000000000]
# those are all billions
This means that the 'equivalent' code in the doc will work.
I started working on this bug as a part of PyCon US 2014 sprints. Should the bugfix include a fast path (basically the current implementation) for when the values involved can fit in an int, and a slow path for larger values? Or should the bugfix just have one path which works for all the cases (using PyObject * for "next", "stop" etc.)?
Yes, you should add a fastpath and a slow path. Take a look at itertools.count() or builtins.enumerate() to see how to do it.
This is a bit tricky to switch between modes, so you will need a thorough set of test cases.
OK. I have written the "slow path" version and tested it a bit. I will add the code to switch between the paths and also add test cases as well. Thanks!
Here's a proposed patch. I need more tests for large values, but all the tests I could think of take a long time to get to a long value. I added some tests that don't take much time but work correctly for long values. If anyone has any ideas for some other tests, please let me know.
This updated patch has support for starting in fast mode until the next count would result in overflow in Py_ssize_t. The first patch started in slow mode as soon as any of 'start', 'stop', or 'step' was outside of the range. With this patch, we start in fast mode if possible and then transition to slow mode when needed.
I also tested this patch for correctness for the following cases:
I did this by temporarily changing the code twice:
The first patch was cleaner. I don't think it is necessary to start-fast and switch-to-slow in the case where not all of the arguments are in the normal range.
OK. Here is the first patch with a couple of bug fixes for "slow mode".
Uploading another patch which is the same as the last patch but this one applies cleanly after the latest islice changes for bpo-21321.
Alok, have you signed a contributor agreement?
Yes, I signed it a few days ago.
Alok, overall the patch looks pretty good and you've done great work on it.
However, in working through its details, I find myself having major misgivings about doubling the size and complexity of the code for something that may not be ever benefit any real code.
Terry noted that range() supports values bigger than the word size but the needs there are much different. Programs can reasonably use ranges with large start points, but an islice() call would have to iterate over *start* values before it begins returning any usable values:
list(range(sys.maxsize+10, sys.maxsize+20)) # maybe a good idea
list(islice(count(), sys.maxsize + 10, sys.maxsize + 20)) # probably not a good idea
When we finally get 64-bit everywhere (not there yet), I think the code in this patch would never get exercised. Even in the 32-bit world, islicing over 2**32 inputs doesn't seem like a great idea.
I'd find it sad if we would, after 5 years of asking for contributions, and after somebody actually contributing, now declare that we really don't want a contribution.
Martin, "finding it sad" doesn't really help much here.
We *can* put the patch in. Alok devoted a good chunk of time to creating the patch and I've already devoted a good chunk of time to reviewing it.
However, in so doing I became concerned that it wasn't the right thing to do. The code size and complexity is much bigger than I expected (as compared to what I had to do for itertools.count for example). The use case is much weaker (because unlike range() and count() we don't have an arbitrary start point). This thought surfaced when reading Alok's notes on the difficulty of testing this patch in a reasonable amount of time.
Besides feeling sad, what is your recommendation? Would you like me to finish the review and check it in to make everyone feel good?
Hi Raymond, Martin,
I won't feel bad about this patch not making it into the final Python distribution. I worked on this bug because it was the only "core C" bug at PyCon US sprints for CPython. I learned a bit more about CPython while working on it and I don't consider the time I spent on this bug as wasted.
Other than symmetry arguments, I can't think of any other argument for islice to accept large values.
a = []
a[sys.maxsize+2:] # gives: []
islice(a, None, sys.maxsize+2) # raises exception
But because islice has to go through the elements of the iterable from the current value to "start", while the first example doesn't, I don't think the symmetry argument is that strong here.
So, I think it's fine if we close this bug without accepting this patch.
Thanks for your time in reviewing it!
All contributions are subject to final commit review. I looked at the patch and it is a *lot* of code for little benefit. I think the better solution would be an informative error message: "Currently, islice arguments must be less than {} on {}-bit systems".format(n, k).
Since I posted nearly 4 years ago, I have become more aware of the important differences between 3.x range as a sequence class whose instances are non-iterator *(re)iterables and count as an iterator class whose instances are one-time *iterators. To me, arbitrarily large indices now seem more appropriate for virtual sequences that can do O[1] indexing rather than iterators where indexing is O[n].
A recent proposal on python-ideas, which as I remember amounted to enhancing count to be more like range, tripped over this difference. I suggested that a new infinite sequence class Count (like range but without necessarily having a stop value) was a more sensible idea for what the OP wanted to do.
Well, as Alok has indicated that he can understand this not getting accepted, I'm now fine with that, too.
Terry: Raymond had already adjusted the error message five years ago.
So closing this as "won't fix".
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['extension-modules', 'type-feature']
title = "islice doesn't accept large stop values"
updated_at =
user = 'https://bugs.python.org/thomasguest'
```
bugs.python.org fields:
```python
activity =
actor = 'loewis'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'loewis'
components = ['Extension Modules']
creation =
creator = 'thomasguest'
dependencies = []
files = ['34967', '34996', '34999', '35205']
hgrepos = []
issue_num = 6305
keywords = ['patch']
message_count = 20.0
messages = ['89494', '89646', '117000', '120456', '216558', '216564', '216568', '216822', '216974', '216991', '217003', '218211', '218234', '218268', '221227', '221230', '221235', '221262', '221288', '221344']
nosy_count = 5.0
nosy_names = ['loewis', 'rhettinger', 'terry.reedy', 'thomasguest', 'AlokSinghal']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue6305'
versions = ['Python 3.5']
```