python / cpython

The Python programming language
https://www.python.org
Other
63.41k stars 30.37k forks source link

islice doesn't accept large stop values #50554

Closed b5d2b95a-2226-4555-ae5f-b51f17ec7501 closed 10 years ago

b5d2b95a-2226-4555-ae5f-b51f17ec7501 commented 15 years ago
BPO 6305
Nosy @loewis, @rhettinger, @terryjreedy
Files
  • islice_large_values.patch: patch to support large values in islice
  • islice_large_values-2.patch: patch to support large values in islice
  • islice_large_values-3.patch
  • islice_large_values-4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['extension-modules', 'type-feature'] title = "islice doesn't accept large stop values" updated_at = user = 'https://bugs.python.org/thomasguest' ``` bugs.python.org fields: ```python activity = actor = 'loewis' assignee = 'rhettinger' closed = True closed_date = closer = 'loewis' components = ['Extension Modules'] creation = creator = 'thomasguest' dependencies = [] files = ['34967', '34996', '34999', '35205'] hgrepos = [] issue_num = 6305 keywords = ['patch'] message_count = 20.0 messages = ['89494', '89646', '117000', '120456', '216558', '216564', '216568', '216822', '216974', '216991', '217003', '218211', '218234', '218268', '221227', '221230', '221235', '221262', '221288', '221344'] nosy_count = 5.0 nosy_names = ['loewis', 'rhettinger', 'terry.reedy', 'thomasguest', 'AlokSinghal'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = 'needs patch' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue6305' versions = ['Python 3.5'] ```

    b5d2b95a-2226-4555-ae5f-b51f17ec7501 commented 15 years ago

    Python 3.0 (r30:67503, Jan 7 2009, 16:22:01) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin

    >>> from itertools import islice, count
    >>> islice(count(), (1<<31) - 1)
    <itertools.islice object at 0x63a0c0>
    >>> islice(count(), (1<<31))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Stop argument for islice() must be a non-negative integer or
    None.
    rhettinger commented 15 years ago

    Clarified the error message in r73535.

    Leaving open as a feature request to support arbitrarily large indices.

    rhettinger commented 14 years ago

    Unassigning. If someone is interested creating a patch, I think it is a reasonable request.

    terryjreedy commented 14 years ago
    In 3.1.2, range handles large numbers.
    >>> list(range(1000000000, 50000000000, 10000000000))
    [1000000000, 11000000000, 21000000000, 31000000000, 41000000000]
    # those are all billions

    This means that the 'equivalent' code in the doc will work.

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    I started working on this bug as a part of PyCon US 2014 sprints. Should the bugfix include a fast path (basically the current implementation) for when the values involved can fit in an int, and a slow path for larger values? Or should the bugfix just have one path which works for all the cases (using PyObject * for "next", "stop" etc.)?

    rhettinger commented 10 years ago

    Yes, you should add a fastpath and a slow path. Take a look at itertools.count() or builtins.enumerate() to see how to do it.

    This is a bit tricky to switch between modes, so you will need a thorough set of test cases.

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    OK. I have written the "slow path" version and tested it a bit. I will add the code to switch between the paths and also add test cases as well. Thanks!

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    Here's a proposed patch. I need more tests for large values, but all the tests I could think of take a long time to get to a long value. I added some tests that don't take much time but work correctly for long values. If anyone has any ideas for some other tests, please let me know.

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    This updated patch has support for starting in fast mode until the next count would result in overflow in Py_ssize_t. The first patch started in slow mode as soon as any of 'start', 'stop', or 'step' was outside of the range. With this patch, we start in fast mode if possible and then transition to slow mode when needed.

    I also tested this patch for correctness for the following cases:

    I did this by temporarily changing the code twice:

    rhettinger commented 10 years ago

    The first patch was cleaner. I don't think it is necessary to start-fast and switch-to-slow in the case where not all of the arguments are in the normal range.

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    OK. Here is the first patch with a couple of bug fixes for "slow mode".

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    Uploading another patch which is the same as the last patch but this one applies cleanly after the latest islice changes for bpo-21321.

    rhettinger commented 10 years ago

    Alok, have you signed a contributor agreement?

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    Yes, I signed it a few days ago.

    rhettinger commented 10 years ago

    Alok, overall the patch looks pretty good and you've done great work on it.

    However, in working through its details, I find myself having major misgivings about doubling the size and complexity of the code for something that may not be ever benefit any real code.

    Terry noted that range() supports values bigger than the word size but the needs there are much different. Programs can reasonably use ranges with large start points, but an islice() call would have to iterate over *start* values before it begins returning any usable values:

      list(range(sys.maxsize+10, sys.maxsize+20))  # maybe a good idea
      list(islice(count(), sys.maxsize + 10, sys.maxsize + 20))  # probably not a good idea

    When we finally get 64-bit everywhere (not there yet), I think the code in this patch would never get exercised. Even in the 32-bit world, islicing over 2**32 inputs doesn't seem like a great idea.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 10 years ago

    I'd find it sad if we would, after 5 years of asking for contributions, and after somebody actually contributing, now declare that we really don't want a contribution.

    rhettinger commented 10 years ago

    Martin, "finding it sad" doesn't really help much here.

    We *can* put the patch in. Alok devoted a good chunk of time to creating the patch and I've already devoted a good chunk of time to reviewing it.

    However, in so doing I became concerned that it wasn't the right thing to do. The code size and complexity is much bigger than I expected (as compared to what I had to do for itertools.count for example). The use case is much weaker (because unlike range() and count() we don't have an arbitrary start point). This thought surfaced when reading Alok's notes on the difficulty of testing this patch in a reasonable amount of time.

    Besides feeling sad, what is your recommendation? Would you like me to finish the review and check it in to make everyone feel good?

    b9c696f1-4b08-4a48-9377-aa6ce13c6280 commented 10 years ago

    Hi Raymond, Martin,

    I won't feel bad about this patch not making it into the final Python distribution. I worked on this bug because it was the only "core C" bug at PyCon US sprints for CPython. I learned a bit more about CPython while working on it and I don't consider the time I spent on this bug as wasted.

    Other than symmetry arguments, I can't think of any other argument for islice to accept large values.

      a = []
      a[sys.maxsize+2:] # gives: []
      islice(a, None, sys.maxsize+2) # raises exception

    But because islice has to go through the elements of the iterable from the current value to "start", while the first example doesn't, I don't think the symmetry argument is that strong here.

    So, I think it's fine if we close this bug without accepting this patch.

    Thanks for your time in reviewing it!

    terryjreedy commented 10 years ago

    All contributions are subject to final commit review. I looked at the patch and it is a *lot* of code for little benefit. I think the better solution would be an informative error message: "Currently, islice arguments must be less than {} on {}-bit systems".format(n, k).

    Since I posted nearly 4 years ago, I have become more aware of the important differences between 3.x range as a sequence class whose instances are non-iterator *(re)iterables and count as an iterator class whose instances are one-time *iterators. To me, arbitrarily large indices now seem more appropriate for virtual sequences that can do O[1] indexing rather than iterators where indexing is O[n].

    A recent proposal on python-ideas, which as I remember amounted to enhancing count to be more like range, tripped over this difference. I suggested that a new infinite sequence class Count (like range but without necessarily having a stop value) was a more sensible idea for what the OP wanted to do.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 10 years ago

    Well, as Alok has indicated that he can understand this not getting accepted, I'm now fine with that, too.

    Terry: Raymond had already adjusted the error message five years ago.

    So closing this as "won't fix".