pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.57k stars 258 forks source link

Recipe: iterator wrapper that can be sorted or compared by the next value in the iterator #549

Closed mentalisttraceur closed 1 year ago

mentalisttraceur commented 1 year ago

Here's a little recipe that I sometimes find useful, in case toolz wants to add it to itertoolz.

The recipe:

from functools import total_ordering as _total_ordering

@_total_ordering
class ComparableAsNextItem:
    __slots__ = ('_iterator', '_next')

    def __init__(self, iterable):
        self._iterator = iter(iterable)
        try:
            self._next = next(self._iterator)
        except StopIteration:
            self._iterator = None

    def __iter__(self):
        return self

    def __next__(self):
        if self._iterator is None:
            raise StopIteration
        current = self._next
        try:
            self._next = next(self._iterator)
        except StopIteration:
            self._iterator = None
        return current

    def __eq__(self, other):
        if self._iterator is None:
            return NotImplemented
        return self._next == other

    def __lt__(self, other):
        if self._iterator is None:
            return NotImplemented
        return self._next < other

Example usage, putting iterables into a minheap to solve a stereotypical interview problem: merge sorted lists get all primes up to some number:

from heapq import heappop, heappush
from itertools import count

def primes_up_to(limit):
    minheap = []
    for number in range(2, limit):
        if number in minheap:
            while number in minheap:
                assert minheap[0]._next == number
                multiples_of_next_prime_factor = heappop(minheap)
                _ = next(multiples_of_next_prime_factor)
                heappush(minheap, multiples_of_next_prime_factor)
        else:
            prime = number
            yield prime
            multiples = count(start=prime*2, step=prime)
            heappush(minheap, ComparableAsNextItem(multiples))

It doesn't come up very often, but ComparableAsNextItem takes just enough thought and code to correctly implement that it seems worth making this available for reuse.

mentalisttraceur commented 1 year ago

If there's desire to include it I can try to whip up a proper PR with docstrings, tests, and documentation changes... maybe also adjust it to reuse something like itertoolz.peek in the implementation.

groutr commented 1 year ago

I'm getting an exception when trying your example:

Cell In [100], line 13, in merge_sorted_lists(*lists)
     11 except StopIteration:
     12     continue
---> 13 heappush(minheap, list_)

File /usr/lib/python3.10/functools.py:91, in _gt_from_lt(self, other, NotImplemented)
     89 def _gt_from_lt(self, other, NotImplemented=NotImplemented):
     90     'Return a > b.  Computed by @total_ordering from (not a < b) and (a != b).'
---> 91     op_result = type(self).__lt__(self, other)
     92     if op_result is NotImplemented:
     93         return op_result

Cell In [98], line 35, in ComparableAsNextItem.__lt__(self, other)
     33 if self._iterator is None:
     34     return NotImplemented
---> 35 return self._next < other

TypeError: '<' not supported between instances of 'int' and 'ComparableAsNextItem'
b = ([1, 2, 3], [4, 5, 1])
tuple(merge_sorted_lists(*b))
mentalisttraceur commented 1 year ago

@groutr that's what I get for switching to a simpler example at the last minute, and assuming I can skimp on checking the simpler one because the more complex example worked.

It breaks once it tries to compare one of the ComparableAsNextItem iterators after getting the last element out of it, which of course makes sense because at that point there is no longer a next element to compare to.

My earlier example was a space-efficient prime sieve using a minheap and ComparableByNextItem(itertools.count(...)) instances (which doesn't have the problem of running off the edge of the iterators).

I'll go ahead and edit my first post to use that more involved example, which actually demonstrates the value added by the recipe better anyway.

mentalisttraceur commented 1 year ago

On further thought, I think I "retract" this for now (I mean, everyone is still welcome to use it, but):

  1. the one situation where I actually found myself using this successfully and cleanly, people found it harder to understand than if I just didn't use this wrapper iterator and instead manually got, compared, and saved the next element (that's largely a familiarity problem which goes away the more you get used to the concept, but still it is a problem) - so it might not be all that worth having,

  2. a lot of the time when I want this while trying to think of a solution, I end up eventually finding a better solution that doesn't use it - so even if this is a good idea overall, it could probably use some more design rethinking, and

  3. it seems like a "proper" version of this recipe is a more powerful, transparent, and composable wrapper (perhaps based on wrapt), so that it could be used to wrap iterators that have additional features (for example Python's native generators with their .send and .throw, or more_itertools.peekable with its .peek and .prepend) - so the right way to do it might be out-of-scope for toolz.